1. Home
  2. Knowledge Base
  3. Articles (EN/ES/FR)
  4. Pandora FMS HA Broken Node Recovery

Pandora FMS HA Broken Node Recovery

When we have an environment in HA, it can be the case that there is some node or network fall and a desynchronization is produced, entering the slave node to work as a master to replace it. This is known as “Broken Node” and to solve it, the following process should be followed:

As a summary, what we will do is to move a backup of the main database to the slave once we manage to recover it and resynchronize it through percona, pacemaker and corosync. Assuming the name of the nodes node1 and node2 and being the node2 the one that has fallen:

We put the node2 in standby by means of the following commands

node2# pcs node standby node2

We make a backup of the Percona data directory of node2 just in case, although we will not use it:

node2# systemctl stop mysqld
node2# [ -e /var/lib/mysql.bak ] && rm -rf /var/lib/mysql.bak
node2# mv /var/lib/mysql /var/lib/mysql.bak

We make a backup of the master node database (node1 in this example) and update the master node name, and the name and position of the master log file in the cluster (in this example node1, mysql-bin.000001 and 785):

node1# [ -e /root/pandoradb.bak ] && rm -rf /root/pandoradb.bak
node1# innobackupex --no-timestamp /root/pandoradb.bak/
node1# innobackupex --apply-log /root/pandoradb.bak/
node1# binlog_info=$(cat /root/pandoradb.bak/xtrabackup_binlog_info)
node1# crm_attribute --type crm_config --name pandoradb_REPL_INFO -s mysql_replication -v "node1|$(echo $binlog_info | awk '{print $1}')|$(echo $binlog_info | awk '{print $2}')"

We load the database from node1 to node2:

node1# rsync -avpP -e ssh /root/pandoradb.bak/ node2:/var/lib/mysql/
node2# chown -R mysql:mysql /var/lib/mysql
node2# chcon -R system_u:object_r:mysqld_db_t:s0 /var/lib/mysql

We deactivate the standby mode of node2 and clean the errors:

node2# pcs node unstandby node2
node2# pcs resource cleanup --node node2

We check the status of the database replication:

node2# mysql -uroot -ppandora

Make sure that Slave_IO_Running and Slave_SQL_Running show Yes in last outpout.

Was this article helpful?

Related Articles

Need Support?

Can't find the answer you're looking for?
Contact Support