Database High Availability on CentOS 7
We are working on the translation of the Pandora FMS documentation. Sorry for any inconvenience.
CentOS 7 will soon reach its End of Life (EOL). This documentation is preserved for historical purposes.
High Availability in the Database
The objective of this section is to offer a complete solution for HA in Pandora FMS environments. This is the only HA model with official support for Pandora FMS. This solution is provided -pre-installed- from OUM 724 to 769. This system replaces DRBD and other HA systems recommended in the past (724 and earlier).
See “Migration of HA Corosync-Pacemaker environments” to update the high availability database used in version 770.
This is the first implementation of Pandora FMS DB HA and the installation process is almost entirely manual, using the GNU/Linux console as root. In future versions we will provide configuration from the graphical interface (GUI).
Pandora FMS is based on a MySQL database to configure and store data. A database failure can momentarily paralyze the monitoring tool. The Pandora FMS high availability database cluster allows you to easily deploy a robust and fault tolerant architecture.
This is an advanced feature that requires knowledge of GNU/Linux systems.
Cluster resources are managed with Pacemaker, an advanced and scalable high availability cluster resource manager. Corosync provides a closed process group communication model for creating replicated state machines. Percona was chosen as the default RDBMS for its scalability, availability, security, and backup features.
The active/passive replication is developed from a single master node (with writable permission) to any number of children ( read only). A virtual IP address always points to the current master. If the master node fails, one of the secondary nodes is promoted to master and the virtual IP address is updated accordingly.
The Pandora FMS HA database tool, pandora_ha
, monitors the cluster and makes sure that the Pandora FMS server is in continuous operation, restarting it if necessary. pandora_ha
is in turn monitored by systemd.
It is recommended to keep a maximum of 15 days of data and events, to save for a longer time a historical database must be configured. Also consult the topic “Management and administration of PFMS servers”.
This example will set up a two-node cluster, with hosts node1 and node2. Hostnames, passwords, etc. will be changed. according to what you need to match the environment to be implemented.
Version 759 or earlier.
Commands that need to be executed on a node will be preceded by the hostname of that node:
node1# <command>
Commands that need to be executed on all nodes will be preceded by the word all:
all# <command>
There is an additional host, called pandorafms, where Pandora FMS is or will be installed.
When referring to all only refers to the Database nodes, the additional Pandora FMS node will always be referenced as pandorafms and is not part of all.
Prerequisites
CentOS 7 will soon reach its End of Life (EOL). This documentation is preserved for historical purposes.
CentOS version 7 must be installed on all hosts, and they must be able to resolve each other's hostnames.
node1# ping node2 PING node2 (192.168.0.2) 56(84) bytes of data. node2# ping node1 PING node1 (192.168.0.1) 56(84) bytes of data. pandorafms# ping node1 PING node1 (192.168.0.1) 56(84) bytes of data. pandorafms# ping node2 PING node2 (192.168.0.2) 56(84) bytes of data.
An Open SSH server must be installed and running on each host. We suppressed the notice that Open SSH shows:
all# [ -f /etc/cron.hourly/motd_rebuild ] && rm -f /etc/cron.hourly/motd_rebuild all# sed -i -e 's/^Banner.*g' /etc/ssh/sshd_config all# systemctl restart sshd
The Pandora FMS HA database tool will not work correctly if Open SSH has a warning configured.
We generate new SSH authentication keys for each host and copy the public key for each of the hosts:
Keys can be generated for a non-root user for a later non-root cluster installation.
node1# echo -e "\n\n\n" | ssh-keygen -t rsa node1# ssh-copy-id -p22 root@node2 node1# ssh node2 node2# echo -e "\n\n\n" | ssh-keygen -t rsa node2# ssh-copy-id -p22 root@node1 node2# ssh node1 pandorafms# echo -e "\n\n\n" | ssh-keygen -t rsa pandorafms# ssh-copy-id -p22 root@node1 pandorafms# ssh-copy-id -p22 root@node2 pandorafms# ssh node1 pandorafms# ssh node2
In the Pandora FMS node, we copy the key pair to /usr/share/httpd/.ssh/
. The Pandora FMS console needs to retrieve the cluster status:
pandorafms# cp -r /root/.ssh/ /usr/share/httpd/ pandorafms# chown -R apache:apache /usr/share/httpd/.ssh/
The following steps are required only if the nodes are running SSH on a non-standard port. You must replace 22 with the correct port number:
all# echo -e "Host node1\n Port 22">> /root/.ssh/config all# echo -e "Host node2\n Port 22">> /root/.ssh/config
Installation of Percona
Install the necessary package:
all# yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm all# yum install -y Percona-Server-server-57 percona-xtrabackup-24
For more information regarding the Percona installation process, you can consult the official product documentation:
Once the packages are installed, make sure that the Percona service is disabled, as it will be managed by the cluster:
all# systemctl disable mysqld
If the system service is not disabled, the cluster resource manager will not work correctly.
Next, start the Percona server:
all# systemctl start mysqld
A new temporary password connected to /var/log/mysqld.log
will be generated. Connect to the Percona server and change the root password:
all# mysql -uroot -p$(grep "temporary password" /var/log/mysqld.log | \ rev | cut -d' ' -f1 | rev) mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('Pandor4!'); mysql> UNINSTALL PLUGIN validate_password; mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('pandora'); mysql> quit
Reinstall the server package with the ha option ( flag ):
pandorafms# ./pandora_server_installer --install --ha
Once the server is installed with the HA tools enabled, you will find the configuration generator for database replication in the path: /usr/share/pandora_server/util/myconfig_ha_gen.sh
Example: ./myconfig_ha_gen.sh -i serverid [-l file_location] [-d database] [-b binlog] [-u dbuser] [-p dbpass] [-s poolsize] [-h help] Mandatory parameters: -i --serverid Set the server id for the database (Mandatory) Optional parameters: -l --location Set my.cnf custom location including filename. [ default value: /etc/my.cnf ] (optional) -d --database Set the database to be replicated. [ default value: pandora ] (optional) -b --binlog Set binlog file. [ default value: mysql-bin ] (optional) -u --dbuser Set dbuser for mysql connection and backups. [ default value: root ] (optional) -p --dbpass Set dbpassword for mysql connection and backups. [ default value: pandora ] (optional) -s --poolsize Set innodb_buffer_pool_size static size in M (Megabytes) or G (Gigabytes). [ default value: autocalculated ] (optional) -h --help Print help.
In the current case where the databases are not on the same server as the application, it will be necessary to copy the script to the nodes to be executed locally.
pandorafms# scp /usr/share/pandora_server/util/myconfig_ha_gen.sh root@node1:/root/ pandorafms# scp /usr/share/pandora_server/util/myconfig_ha_gen.sh root@node2:/root/
It will only be necessary to pass the serverid parameter (mandatory) in standard environments and some optional parameters for custom environments.
If the default or defined user does not connect to the database, the script will end with a connection error.
There is also the possibility of passing the database, the user and the password as arguments. If you don'tThe default ones will be used.
In this case, the script will be executed on both nodes only by passing the server id if we have the default credentials, otherwise define the necessary parameters.
node1# /root/myconfig_ha_gen.sh -i 1 node2# /root/myconfig_ha_gen.sh -i 2
Important: Each node must have a unique identifier
The Percona configuration file will be written to /etc/my.cnf
where the server identifier and the recommended configuration for database replication will be defined.
You must restart the mysqld service to verify that the configuration has been applied correctly.
all# systemctl restart mysqld
Installation of Pandora FMS
New installation of Pandora FMS
Install Pandora FMS in the newly created database. Stop the Pandora FMS server:
pandorafms# /etc/init.d/pandora_server stop
As of version NG 754, it has additional options in manual startup and shutdown of High Availability Environments (HA).
Existing Pandora FMS installation
Stop the Pandora FMS server:
pandorafms# /etc/init.d/pandora_server stop
Make a backup of the Pandora FMS database:
pandorafms# mysqldump -uroot -ppandora --databases pandora> /tmp/pandoradb.sql pandorafms# scp /tmp/pandoradb.sql node1:/tmp/
Now load the information to the new database:
node1# mysql -uroot -ppandora pandora -e source "/tmp/pandoradb.sql"
Replication Settings
Grant the necessary privileges for replication to work on all databases:
all# mysql -uroot -ppandora mysql> GRANT ALL ON pandora.* TO 'root'@'%' IDENTIFIED BY 'pandora'; mysql> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'root'@'%' IDENTIFIED BY 'pandora'; mysql> GRANT REPLICATION CLIENT, REPLICATION SLAVE, SUPER, PROCESS, RELOAD ON *.* TO 'root'@'localhost' IDENTIFIED BY 'pandora'; mysql> GRANT select ON mysql.user TO 'root'@'%' IDENTIFIED BY 'pandora'; mysql> FLUSH PRIVILEGES; mysql> quit
Back up the database on the first node and write down the name and position of the master log file (in the example,
mysql-bin.000001
and 785
):
node1# [ -e /root/pandoradb.bak ] && rm -rf /root/pandoradb.bak node1# innobackupex --no-timestamp /root/pandoradb.bak/ node1# innobackupex --apply-log /root/pandoradb.bak/ node1# cat /root/pandoradb.bak/xtrabackup_binlog_info mysql-bin.000001 785
Load the database on the second node and configure to replicate from the first node (you must set MASTER_LOG_FILE
and MASTER_LOG_POS
to the values from the previous step):
node2# systemctl stop mysqld node1# rsync -avpP -e ssh /root/pandoradb.bak/ node2:/var/lib/mysql/ node2# chown -R mysql:mysql /var/lib/mysql node2# chcon -R system_u:object_r:mysqld_db_t:s0 /var/lib/mysql node2# systemctl start mysqld node2# mysql -uroot -ppandora mysql> CHANGE MASTER TO MASTER_HOST='node1', MASTER_USER='root', MASTER_PASSWORD='pandora', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=785; mysql> START SLAVE; mysql> SHOW SLAVE STATUS \G **************************** 1.row ******************** ******* Slave_IO_State: Waiting for master to send event Master_Host: node1 Master_User: root Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 785 Relay_Log_File: node2-relay-bin.000003 Relay_Log_Pos: 998 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: pandora Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 785 Relay_Log_Space: 1252 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error:Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 580d8bb0-6991-11e8-9a22-16efadb2f150 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) mysql> QUIT
You must ensure that Slave_IO_Running
and Slave_SQL_Running
return Yes
. Other values may be different from the example.
It is recommended not to use the root user to carry out this process. It is advisable to give permissions to another user in charge of managing the database to avoid possible conflicts.
Two Node Cluster Configuration
Install the necessary packages:
all# yum install -y epel-release corosync ntp pacemaker pcs all# systemctl enable ntpd all# systemctl enable corosync all# systemctl enable pcsd all# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
all# systemctl start ntpd all# systemctl start corosync all# systemctl start pcsd
Stop the Percona server:
node1# systemctl stop mysqld
node2# systemctl stop mysqld
Authentication of all nodes in the cluster
Create and start the cluster:
all# echo hapass | passwd hacluster --stdin
node1# pcs cluster auth -u hacluster -p hapass --force node1 node2 node1# pcs cluster setup --force --name pandoraha node1 node2 node1# pcs cluster start --all node1# pcs cluster enable --all node1# pcs property set stonith-enabled=false node1# pcs property set no-quorum-policy=ignore
Check the status of the cluster:
node#1 pcs status Cluster name: pandoraha Stack: corosync Current DC: node1 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum Last updated: Fri Jun 8 12:53:49 2018 Last change: Fri Jun 8 12:53:47 2018 by root via cibadmin on node1 2 nodes configured 0 resources configured Online: [ node1 node2 ] No resources Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
Both nodes should be online
(Online: [ node1 node2 ]
).
Other values may be different from the example.
Installing the Percona Pacemaker Replication Agent
It can be downloaded manually from PFMS library.
all# cd /usr/lib/ocf/resource.d/ all# mkdir percon all# cd percona all# curl -L -o pacemaker_mysql_replication.zip https://pandorafms.com/library/wp-content/uploads/2019/12/pacemaker_mysql_replication.zip all# unzip pacemaker_mysql_replication.zip all# rm -f pacemaker_mysql_replication.zip all# chmod u+x mysql
Configure the cluster resources. Replace < VIRT_IP > with your preferred virtual IP address:
If the default password used in this guide for the database user root has been changed, you should update replication_passwd
and test_passwd
respectively.
The names of the cluster resources must be exactly those indicated in this guide ( pandoraip
and pandoradb
)
node1# export VIP='<VIRT_IP>' node1# pcs resource create pandoradb ocf:percona:mysql config="/etc/my.cnf" \ pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock" \ replication_user="root" replication_passwd="pandora" max_slave_lag="60" \ evict_outdated_slaves="false" binary="/usr/sbin/mysqld" datadir="/var/lib/mysql" \ test_user="root" test_passwd="pandora" op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" op promote timeout="120" op demote timeout="120" \ op monitor role="Master" timeout="30" interval="5" on-fail="restart" op monitor role="Slave" \ timeout="30" interval="10" node1# pcs resource create pandoraip ocf:heartbeat:IPaddr2 ip=$VIP cidr_netmask=24 \ op monitor interval=20s node1# pcs resource master master_pandoradb pandoradb meta master-max="1" \ master-node-max="1" clone-max="2" clone-node-max="1" notify="true" \ globally-unique="false" target-role="Master" is-managed="true" noode1# pcs constraint colocation add master master_pandoradb with pandoraip node1# pcs constraint order promote master_pandoradb then start pandoraip
Check the status of the cluster:
node1# pcs status Cluster name: pandoraha Stack: corosync Current DC: node1 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum Last updated: Fri Jun 8 13:02:21 2018 Last change: Fri Jun 8 13:02:11 2018 by root via cibadmin on node1 2 nodes configured 3 resources configured Online: [ node1 node2 ] Full list of resources: Master/Slave Set: master_pandoradb [pandoradb] Masters: [ node1 ] Slave: [ node2 ] pandoraip (ocf::heartbeat:IPaddr2): Started node1 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
Both nodes should be online
(Online: [ node1 node2 ]
).
Other values may be different from the example.
Configuring the two-node cluster with "non-root" user
It will be done similar to the previous way. The user credentials, previously explained, must have been copied, and the following steps must be carried out:
# All nodes: useradd <user> passwd <user> usermod -a -G haclient <user> # Enable PCS ACL system pcs property set enable-acl=true --force # create role pcs acl role create <role> description="RW role" write xpath /cib # Create PCS user - Local user pcs acl user create <user> <role> # Login into PCS from ALL nodes su - <user> pcs status Username: <username> Password: ***** # Wait for 'Authorized' message, ignore output. Wait a second and retry 'pcs status' command