AlwaysOn Cluster Server Status Flipping Between Red & Green

Overview

The server status (shown below with red arrow) in the Cluster Info section in Live Monitor switches frequently between Red & Green if the alerting service for the AlwaysOn HA cluster isn't configured correctly.

serverstatus.png

 

Solution

Pre-Requisites

Microsoft SQL Server Management Studio (SSMS)

Diagnosis

<supportagent>

Ask the customer to share the log file at /data/logs/services/always_on_monitor.log.

</supportagent>

Check the log file at/data/logs/services/always_on_monitor.log for following entries:

YYYY-MM-DD HH:MM:SS.sss ERROR AlwaysonMonitor(1): Error while fetching group database id: ('42000', '[42000] [FreeTDS][SQL Server]The user does not have permission to perform this action. (297) (SQLExecDirectW)')
YYYY-MM-DD HH:MM:SS.sss ERROR AlwaysOnMonitor(1): Problem determining current master: ('42000', '[42000] [FreeTDS][SQL Server]The user does not have permission to perform this action. (297) (SQLExecDirectW)')
YYYY-MM-DD HH:MM:SS.sss ERROR AlwaysOnMonitor(1): Failed to inform core about server role/type :  [INFO] The failover command did not change any role/role-setting.
YYYY-MM-DD HH:MM:SS.sss ERROR AlwaysOnMonitor(1): Problem determining health status from hadr_avail: ('42000', '[42000] [FreeTDS][SQL Server]The user does not have permission to perform this action. (297) (SQLExecDirectW)')

This indicates a permission issue for the ScaleArc user to query tables holding AlwaysON status, effectively meaning that these queries are not executing properly with the ScaleArc kerberized user:

SELECT database_name, group_database_id FROM sys.availability_databases_cluster WHERE group_id = <param> and database_name in <param>;
SELECT a.endpoint_url, a.replica_server_name, b.role, b.connected_state, b.operational_state FROM sys.availability_replicas a, master.sys.dm_hadr_availability_replica_states b WHERE a.group_id = b.group_id and a.replica_id = b.replica_id and a.group_id='<param>';

SELECT endpoint_url, replica_server_name from sys.availability_replicas WHERE replica_server_name = (SELECT primary_replica FROM sys.dm_hadr_availability_group_states);

Steps To Fix

You must ensure that the ScaleArc user has full access to the HA system tables (referenced in the queries above) to be able to determine the AlwaysON status and detect a role change in case an AlwaysON failover is performed from the database cluster side.

Log in to the database servers as an Administrator of the affected cluster with SSMS, and in the Object Explorer, navigate to Security > Logins for each database and make sure that the following permissions are enabled on the login properties of each ScaleArc account on the databases:

  • Connect to Any database
  • Create Any database
  • View Any database
  • View definition
  • View server state

 

Testing

You can test this by performing an HA failover and the alerts will clear out on all instances of the ScaleArc HA cluster. No new errors relating to this will be logged in /data/logs/services/always_on_monitor.log & the cluster remains in green as expected.

Comments

0 comments

Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request