When we have an environment in HA, it can be the case that there is some node or network fall and a desynchronization is produced, entering the slave node to work as a master to replace it. This is known as “Broken Node” and to solve it, the following process should be followed:
As a summary, what we will do is to move a backup of the main database to the slave once we manage to recover it and resynchronize it through percona, pacemaker and corosync. Assuming the name of the nodes node1 and node2 and being the node2 the one that has fallen:
We put the node2 in standby by means of the following commands
node2# pcs node standby node2
We make a backup of the Percona data directory of node2 just in case, although we will not use it:
node2# systemctl stop mysqld node2# [ -e /var/lib/mysql.bak ] && rm -rf /var/lib/mysql.bak node2# mv /var/lib/mysql /var/lib/mysql.bak
We make a backup of the master node database (node1 in this example) and update the master node name, and the name and position of the master log file in the cluster (in this example node1, mysql-bin.000001
and 785
):
node1# [ -e /root/pandoradb.bak ] && rm -rf /root/pandoradb.bak node1# innobackupex --no-timestamp /root/pandoradb.bak/ node1# innobackupex --apply-log /root/pandoradb.bak/ node1# binlog_info=$(cat /root/pandoradb.bak/xtrabackup_binlog_info) node1# crm_attribute --type crm_config --name pandoradb_REPL_INFO -s mysql_replication -v "node1|$(echo $binlog_info | awk '{print $1}')|$(echo $binlog_info | awk '{print $2}')"
We load the database from node1 to node2:
node1# rsync -avpP -e ssh /root/pandoradb.bak/ node2:/var/lib/mysql/
node2# chown -R mysql:mysql /var/lib/mysql node2# chcon -R system_u:object_r:mysqld_db_t:s0 /var/lib/mysql
We deactivate the standby mode of node2 and clean the errors:
node2# pcs node unstandby node2 node2# pcs resource cleanup --node node2
We check the status of the database replication:
node2# mysql -uroot -ppandora mysql> SHOW SLAVE STATUS
Make sure that Slave_IO_Running and Slave_SQL_Running show Yes
in last outpout.