Overview
The ScaleArc health monitor notifies that the witness server is unreachable by logging the following alert:
HA Alert: Witness server unreachable
Further, the ScaleArc logs have the following errors, usually with the same error appearing multiple times:
ERROR FAILED: Witness server setup (65280, 'ssh_exchange_identification: read: Connection reset by peer\r').
ERROR FAILED: Witness server setup (65280, 'ssh_exchange_identification: Connection closed by remote host\r').
Environment
- MSSQL Server with Always ON cluster
- HA enabled with 'ScaleArc Cluster' configured as the fencing option
- HA enabled with 'SSH Server' configured as the fencing option
Solution
This alert can be generated on MSSQL HA environments relying on ScaleArc Cluster as the fencing option or where an SSH Server configured as the witness server experiences a network outage.
The 'witness server unreachable' alert is expected to appear only when the cluster is down in a HA setup that relies on a ScaleArc cluster for resolving split-brain situations.
If it continues to appear despite the cluster being up, ensure both ScaleArc nodes have full access to the SQL Servers as HA creates a small database prefixed with SA_*
(e.g. SA_024571ec_6b8
) and a table in this database to continuously update and query the HA status.
This database should be made part of the AlwaysON Availability Group in SQL Server. Refer to this external article for detailed instructions on Creating AlwaysOn Availability Groups in SQL Server.
ScaleArc makes use of the HA cluster name to name this database when the HA fencing is configured to use a ScaleArc Cluster which is the default and recommended option. You can find the HA cluster name by running the following command in an SSH session on the Primary or Secondary node:
# pcs status | grep name
Cluster name: SA_aeaa3fae_30e
Alternatively, to achieve HA independent of cluster status you can configure the other two supported fencing options i.e. using an External database or SSH fencing as documented in Set Up High Availability and also shown below:
The alert can also be encountered in the SSH Server fencing scenario, in which case the root cause could be some kind of rate-limiting at the SSH server or network/firewall equipment between the ScaleArc servers and the SSH witness server.
Further troubleshooting will require the customer to provide information on the Operating System running on the SSH witness server as well as the following log files:
/var/log/messages
/var/log/auth.log
/etc/ssh/sshd_config
If the above is insufficient to isolate the root cause, further investigation can be carried out by taking tcpdump
traffic captures with the help of the ScaleArc Support team.
Testing
The 'Witness server unreachable' alert should go away after putting the SA_*
database into the AlwaysOn Availability Group in MSSQL or configuring either of the other two supported fencing options described in the solution section.
Comments
0 comments
Please sign in to leave a comment.