Split-Brain Resolution Process

Overview

A Split Brain situation is a state where each of the ScaleArc nodes in a HA setup believes it is the only active surviving node. 

This article provides the steps to quickly recover from a HA Split-Brain situation if it occurs.

Prerequisites

  • Access to the ScaleArc appliance via SSH
  • Credentials for the idb user

Solution

Split-brain resolution is achieved using a witness server which is not a ScaleArc node.

The witness server is the system that keeps track of the current Primary node and in case of a split-brain, determine the node most capable of taking over resources and become the new primary.

Follow these steps to resolve the split-brain situation for older releases up to v3.11.0.2:

  1.  Try to identify Secondary ScaleArc prior to the split-brain situation. If you are not sure, try the two methods below to identify which node was secondary:
    1. # cat /etc/ha.d/haresources
      • The host displayed here is usually the primary
    2. # cat /logs/ha_idb.log
      • This could reveal the primary/secondary instance prior to the split-brain situation
  1. Login to the Secondary ScaleArc UI and navigate to "SETTINGS > HA Settings" and click on the Restart button. This will restart the heartbeat on the Secondary ScaleArc and try to auto-resolve the split-brain.
  2. If the above step does NOT resolve the issue, then perform the following:
    1. SSH into the Secondary ScaleArc and stop the heartbeat service using the below command:
      • # sudo service heartbeat stop
    2. Login to the Primary ScaleArc UI and restart heartbeat
    3. The above 2 steps should resolve the split-brain situation
    4. Now start the heartbeat service on the Secondary ScaleArc using this command:
      • sudo service heartbeat start

For newer versions released after v3.11.0.2 (i.e. 3.11.0.4, 3.12 or 20XX.X and later releases), follow these steps to resolve reported Fencing/Split Brain Resolution warnings:

  1. Confirm the HA service status on Primary and Secondary. This is done by navigating to SETTINGS > HA Settings on both the primary and secondary nodes to verify that the HA Service is running.
  2. Configure the fencing cluster to be one of the clusters, then change it to a different cluster and finally back to the original one.

    Fencing_options.png

  3. The HA alerts should stop being logged confirming that HA is now operating as expected.

Comments

0 comments

Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request