Overview

A Split Brain situation is a state where each of the ScaleArc nodes in a HA setup believes it is the only active surviving node.

This article provides the steps to quickly recover from a HA Split-Brain situation if it occurs.

Prerequisites

Access to the ScaleArc appliance via SSH
Credentials for the idb user

Solution

Split-brain resolution is achieved using a witness server which is not a ScaleArc node.

The witness server is the system that keeps track of the current Primary node and in case of a split-brain, determine the node most capable of taking over resources and become the new primary.

Follow these steps to resolve the split-brain situation for older releases up to v3.11.0.2:

Try to identify Secondary ScaleArc prior to the split-brain situation. If you are not sure, try the two methods below to identify which node was secondary:
1. ```
# cat /etc/ha.d/haresources
```
  - The host displayed here is usually the primary
2. ```
# cat /logs/ha_idb.log
```
  - This could reveal the primary/secondary instance prior to the split-brain situation

Login to the Secondary ScaleArc UI and navigate to "SETTINGS > HA Settings" and click on the Restart button. This will restart the heartbeat on the Secondary ScaleArc and try to auto-resolve the split-brain.
If the above step does NOT resolve the issue, then perform the following:
1. SSH into the Secondary ScaleArc and stop the heartbeat service using the below command:
  - ```
  # sudo service heartbeat stop
```
2. Login to the Primary ScaleArc UI and restart heartbeat
3. The above 2 steps should resolve the split-brain situation
4. Now start the heartbeat service on the Secondary ScaleArc using this command:
  - ```
  # sudo service heartbeat start
```

For newer versions released after v3.11.0.2 (i.e. 3.11.0.4, 3.12 or 20XX.X and later releases), follow these steps to resolve reported Fencing/Split Brain Resolution warnings:

Confirm the HA service status on Primary and Secondary. This is done by navigating to SETTINGS > HA Settings on both the primary and secondary nodes to verify that the HA Service is running.
Configure the fencing cluster to be one of the clusters, then change it to a different cluster and finally back to the original one.
The HA alerts should stop being logged confirming that HA is now operating as expected.

Split-Brain Resolution Process

Overview

Prerequisites

Solution

Comments

Other articles in this section

Contact Details