====== High Availability (HA) ====== {{indexmenu_n>6}} ===== Introduction ===== In critical and/or high load environments, it is possible that the load needs to be distributed among several machines and to be sure that if any Pandora FMS component fails, the system remains online. Pandora FMS has been designed to be modular but it is also designed to work in collaboration with other components and to be able to assume the load of those components that have fallen. Obviously, the agents are not redundant. The solution is to redundant the critical systems -regardless of whether they have Pandora FMS agents running or not- and thus redundant the monitoring of these systems. High Availability (**HA**) can be used in several scenarios: * Data server. * Network Servers, WMI, Plugin, Web, Prediction, Recon, and similar. * Database (**BBDD**). * Console web Pandora FMS. ===== HA of Data Server ===== * For the Pandora FMS data server you need to mount two machines with a Pandora FMS data server configured (and different //hostname// and server name). * A Tentacle server must be configured in each of them. * Each machine will have a different IP address. * If you are going to use an external balancer, it will provide a single IP address to which the agents will connect to send their data. * In the case of using the HA mechanism of the agents, there will be a small delay in sending data, since at each execution of the agent, it will try to connect with the primary server, and if it does not answer, it will try to connect with the secondary server (if it has been configured this way). If two data servers are to be used and both handle remote policies, collections, and configurations, key directories must be shared so that all instances of the data server can read and write to these directories. The consoles must also have access to these shared directories. * ''/var/spool/pandora/data_in/conf'' * ''/var/spool/pandora/data_in/collections'' * ''/var/spool/pandora/data_in/md5'' * ''/var/spool/pandora/data_in/netflow'' * ''/var/www/html/pandora_console/attachment'' ===== Balancing in the Software Agents ===== From the Software Agents it is possible to perform a data server balancing since it is possible to configure a //master// (main) and a //backup// (operational backup) data server. In the Software Agent configuration file ''pandora_agent.conf'' [[:en:documentation:pandorafms:installation:05_configuration_agents#general_agent_parameters|it should be configured and uncommented]] the following part of the agent configuration file: * **secondary_mode**: Mode in which the secondary server must be. It can have two values: * **on_error**: Sends data to the secondary server only if it cannot send it to the primary server. * **always**: It always sends data to the secondary server, regardless of whether or not it can contact the primary server. * **secondary_server_ip**: IP address of the secondary server. * **secondary_server_path**: Path where the XML is copied to the secondary server, typically ''/var/spool/pandora/data_in'' . * **secondary_server_port**: Port through which the XML will be copied to the secondary server, in Tentacle 41121, in SSH port 22 and in FTP 21. * **secondary_transfer_mode**: Transfer mode to be used to copy the XML to the secondary server, Tentacle, SSH, FTP. * **secondary_server_pwd**: Password option for FTP transfer. * **secondary_server_ssl**: You can set ''yes'' or ''no'' depending on whether you want to use SSL to transfer data through Tentacle. * **secondary_server_opts**: Other options required for the transfer will be put in this field. Only the remote configuration of the Software Agent is operative, if enabled, on the main server. ===== HA of network, WMI, plugin, web and prediction servers, among others ===== You need to install [[:en:documentation:pandorafms:introduction/02_architecture#pandora_fms_servers|multiple servers]]: network server, WMI server, Plugin server, Web server or prediction server, on several machines on the network (all with the same visibility to the systems you want to monitor) and all on the same segment (so that network latency data is consistent). Servers can be marked as **primary**. These servers will automatically collect the data of all the modules assigned to a server that is marked as "down". Pandora FMS servers themselves implement a mechanism to detect that one of them is down through a verification of its last contact date (''server threshold x 2''). It is enough that there is only one Pandora FMS server active for it to be able to detect that the rest are down. The obvious way to implement HA and load balancing in a two-node system is to assign 50% of the modules to each server and mark both servers as primary (//Master//). In the case of having more than two main servers and a third server down with modules pending execution, the first of the main servers that executes the module "self-assigns" the module of the downed server. In case of recovery of one of the downed servers, the modules that had been assigned to the primary server are automatically reassigned. ==== Server configuration ==== A Pandora FMS server can be running in two different modes: * Master mode (main mode or ''MASTER'' mode). * Non-master mode. If a server goes "down" (is offline), its modules will be executed by the master server so that no data is lost. At any given time there can only be one master server which is chosen among the servers that have the master configuration option in ''/etc/pandora/pandora_server.conf'' with a value greater than 0: master [1..7] If the current master server goes down, a new master is chosen. If there is more than one candidate, the one with the highest master value is chosen. Be careful when disabling servers. If a server with network modules goes down and the Network Server of the master server is disabled, those modules will not run. Inside ''pandora_server.conf'' the following parameters have been introduced: * ''ha_file'': HA temporary binary file address. * ''ha_pid_file'': Current HA process. * ''pandora_service_cmd'': Pandora FMS service status control. ==== Add PFMS servers to an HA DB cluster ==== If [[:en:documentation:pandorafms:complex_environments_and_optimization:06_ha#high_availability_database|High Availability is enabled in the Database]] some extra configurations are needed to connect more Pandora FMS servers to the MySQL cluster. In the ''pandora_server.conf'' file (located by default at ''/etc/pandora''), for each of the independent Pandora FMS servers to be added, the following parameters must be configured: * ''dbuser'': You must have the username to access the MySQL cluster. For example: dbuser pandora * ''dbpass'': It must contain the user password to access the MySQL cluster. For example: dbpass pandora * ''ha_hosts'': The ''ha_host'' parameter must be configured followed by the IP addresses or FQDN of the MySQL servers that make up the HA environment. Example: ha_hosts 192.168.80.170,192.168.80.172 ===== High availability of Pandora FMS console ===== Just [[:en:documentation:pandorafms:command_center:03_installation|install another console]] pointing to the database. Any of the consoles can be used simultaneously from different locations by different users. A [[https://pandorafms.com/guides/public/books/load-balancing-with-keepalived|web load balancer]] can be used in front of the consoles in case horizontal growth is needed for console load sharing. The session system is managed through cookies and these are stored in the browser. In the case of using remote configuration and to manage it from all consoles, both data servers and consoles must share the input data directory (default: ''/var/spool/pandora/data_in'') for the remote configuration of agents, collections and other directories (see topic [[:en:documentation:pandorafms:technical_annexes:15_security_architecture#possible_vulnerabilities_and_safeguards|"Security architecture"]]). It is important to only share subdirectories within ''data_in'' and not ''data_in'' itself as it will negatively affect server performance. ==== Update ==== When updating Pandora FMS console in an HA environment, it is important to bear in mind the following points when updating by means of OUM through **Management** → **Warp update** → [[:en:documentation:pandorafms:installation:02_anexo_upgrade|Update offline]]. Enterprise version users can download the OUM package from Pandora FMS support website. When in a balanced environment with a shared database, updating the first console applies the corresponding changes to the database. This means that when updating the secondary console, Pandora FMS shows an error massage when finding the already entered information in the database. However, the console is still updated. ===== High Availability Database ===== {{:wiki:icono-modulo-enterprise.png?nolink&23x23 |Versión Enterprise}}The main goal of this section is to offer a complete solution for HA in Pandora FMS environments. This is the only HA model with official support for Pandora FMS, **and it is provided from version 770 onwards**. This system replaces cluster configuration with **Corosync** and **Pacemaker** from previous versions. The new Pandora FMS HA solution is integrated into the product (inside the pandora_ha binary). It implements an HA that supports geographically isolated sites, with different IP ranges, which is not possible with **Corosync**/**Pacemaker**. In the new HA model, the usual setup is in pairs of two, so the design does not implement a quorum system and simplifies configuration and the necessary resources. That way the monitoring system will work as long as there is a DB node available and in case there is a DB [[#split-brain|Split-Brain]], the system will work in parallel until both nodes are merged again. Starting with version 772, the HA was changed to make it simpler and less buggy. For this new HA, it is recommended to use SSD disks for a higher writing/reading speed (IOPS), minimum 500 Mb/s (or more, depending on the environment). The latency between servers must also be taken into account since with very high latencies it is difficult to synchronize both databases in time. The new proposal seeks to solve the current three problems: - Complexity and maintainability of the current system (up to version NG 770). - Possibility of having an HA environment spread over different geographical locations with non-local network segmentation. - Data recovery procedure in case of [[:en:documentation:pandorafms:complex_environments_and_optimization:06_ha#split-brain|Split-Brain]] and secured system operation in case of communication breakdown between the two geographically separated sites. The new HA system for DB is implemented on Percona8, although in future versions we will detail how to do so also on MySQL/MariaDB 8. Pandora FMS is based on a MySQL database to configure and store data, so a failure in the database can temporarily paralyze the monitoring tool. The Pandora FMS high availability database cluster allows to easily deploy a strong and fail-safe architecture. This is an advanced feature that requires knowledge in GNU/Linux systems. It is important that all servers have the time synchronized with an NTP server (**chronyd** service in Rocky Linux 8). Binary replication MySQL cluster nodes are managed with the ''pandora_ha'' binary, starting with version 770 (Enterprise feature). Percona was chosen as the default RDBMS for its scalability, availability, security and backup features. Active/passive [[http://dev.mysql.com/doc/refman/5.7/en/replication.html|replication]] takes place from a single master node (with writing permissions) to any number of secondaries (read-only). If the master node fails, one of the secondaries is upgraded to master and ''pandora_ha'' takes care of updating the IP address of the master node. The environment will consist of the following elements: * MySQL8 servers with binary replication enabled (Active - Passive). * Server with ''pandora_ha'' with the configuration of all MySQL servers to carry out ongoing monitoring and perform the slave-master and master-slave promotions necessary for the correct operation of the cluster. {{ :wiki:pfms-database_cluster.png }} ==== Installation of Percona 8 ==== Version 770 or later. === Percona 8 Installation for RHEL 8 and Rocky Linux 8 === First of all, it is necessary to have the Percona repository installed in all the nodes of the environment in order to be able to install the Percona server packages later on. You must open a terminal window with **root** rights or as **root** user. You are solely responsible for this key. The following instructions indicate whether you should run instructions on all devices, on some devices or on one device in particular, please pay attention to the statements. Execute on **all** devices involved: yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm Activate version 8 of the Percona repository on **all** devices: percona-release setup ps80 Install the Percona server next to the backup tool with which the backups are to be performed for manual synchronization of both environments. Run on **all** devices involved: yum install percona-server-server percona-xtrabackup-80 In case you install the Percona server together with the Web Console and the PFMS server, you will be able to use the [[:en:documentation:pandorafms:installation:01_installing#requirements_for_the_use_of_the_online_installation_tool_online|deploy]] indicating the MySQL 8 version by means of the ''MYVER=80'' parameter: curl -Ls https://pfms.me/deploy-pandora-el8 | env MYVER=80 bash === Installing Percona 8 on Ubuntu Server === Install Percona repository version 8 on **all** devices: curl -O https://repo.percona.com/apt/percona-release_latest.generic_all.deb apt install -y gnupg2 lsb-release ./percona-release_latest.generic_all.deb Activate Percona repository version 8 on **all** devices: percona-release setup ps80 Install the Percona server next to the backup tool with which backups are to be performed for manual synchronization of both environments. On **all** devices run: apt install -y percona-server-server percona-xtrabackup-80 ==== Binary replication configuration ==== It is recommended that to store the //binlogs// generated by replication in an additional partition or external disk, its size must be the same as the one reserved for the database. See also the //token// ''log-bin'' and ''log-bin-index''. Once you have installed MySQL server in all the cluster nodes, proceed to configure both environments to have them replicated. First of all, configure the configuration file ''my.cnf'' preparing it for the binary replication to work correctly. === Node 1 === Node 1 ''/etc/my.cnf'' ( ''/etc/mysql/my.cnf'' for Ubuntu server): [mysqld] server_id=1 # It is important that it is different in all nodes. datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid # OPTIMIZATION FOR PANDORA FMS innodb_buffer_pool_size = 4096M innodb_lock_wait_timeout = 90 innodb_file_per_table innodb_flush_method = O_DIRECT innodb_log_file_size = 64M innodb_log_buffer_size = 16M thread_cache_size = 8 max_connections = 200 key_buffer_size=4M read_buffer_size=128K read_rnd_buffer_size=128K sort_buffer_size=128K join_buffer_size=4M sql_mode="" # SPECIFIC PARAMETERS FOR BINARY REPLICATION binlog-do-db=pandora replicate-do-db=pandora max_binlog_size = 100M binlog-format=MIXED binlog_expire_logs_seconds=172800 # 2 DAYS #log-bin=/path # Uncomment for adding an additional storage path for binlogs and setting a new path #log-bin-index=/path/archive.index # Uncomment for adding an additional storage path for binlogs and setting a new path sync_source_info=1 sync_binlog=1 port=3306 report-port=3306 report-host=master gtid-mode=off enforce-gtid-consistency=off master-info-repository=TABLE relay-log-info-repository=TABLE sync_relay_log = 0 replica_compressed_protocol = 1 replica_parallel_workers = 1 innodb_flush_log_at_trx_commit = 2 innodb_flush_log_at_timeout = 1800 [client] user=root password=pandora * The tokens after the **OPTIMIZATION FOR PANDORA FMS** comment perform the optimized configuration for Pandora FMS. * After the comment **SPECIFIC PARAMETERS FOR BINARY REPLICATION** the specific parameters for the binary replication are configured. * The token called ''binlog_expire_logs_seconds'' is configured for a period of two days. * In the ''[client]'' subsection enter the user and password used for the database, by default when installing PFMS they are ''root'' and ''pandora'' respectively. //These values are **necessary** to perform the backups without indicating user (automated).// It is important that the ''server_id'' token is different in all nodes, in this example for node 1 this number is used. === Node 2 === Node 2 ''/etc/my.cnf'' ( ''/etc/mysql/my.cnf'' for Ubuntu server) [mysqld] server_id=2 # It is important that it is different in all nodes. datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid # OPTIMIZATION FOR PANDORA FMS innodb_buffer_pool_size = 4096M innodb_lock_wait_timeout = 90 innodb_file_per_table innodb_flush_method = O_DIRECT innodb_log_file_size = 64M innodb_log_buffer_size = 16M thread_cache_size = 8 max_connections = 200 key_buffer_size=4M read_buffer_size=128K read_rnd_buffer_size=128K sort_buffer_size=128K join_buffer_size=4M sql_mode="" # SPECIFIC PARAMETERS FOR BINARY REPLICATION binlog-do-db=pandora replicate-do-db=pandora max_binlog_size = 100M binlog-format=MIXED binlog_expire_logs_seconds=172800 # 2 DAYS #log-bin=/path # Uncomment for adding an additional storage path for binlogs and setting a new path #log-bin-index=/path/archive.index # Uncomment for adding an additional storage path for binlogs and setting a new path sync_source_info=1 sync_binlog=1 port=3306 report-port=3306 report-host=master gtid-mode=off enforce-gtid-consistency=off master-info-repository=TABLE relay-log-info-repository=TABLE sync_relay_log = 0 replica_compressed_protocol = 1 replica_parallel_workers = 1 innodb_flush_log_at_trx_commit = 2 innodb_flush_log_at_timeout = 1800 [client] user=root password=pandora * The tokens after the **OPTIMIZATION FOR PANDORA FMS** comment perform the optimized configuration for Pandora FMS. * After the comment **SPECIFIC PARAMETERS FOR BINARY REPLICATION** the specific parameters for the binary replication are configured. * The token called ''binlog_expire_logs_seconds'' is configured for a period of two days. * In the ''[client]'' subsection enter the user and password used for the database, by default when installing PFMS they are ''root'' and ''pandora'' respectively. //These values are **necessary** to perform the backups without indicating user (automated).// It is important that the ''server_id'' token is different in all nodes, in this example for node 2 this number is used. === Master node configuration === Once you have the correct configuration on both nodes, start the configuration of the node //that will take the role of master server//. **1.-** Start the ''mysqld'' service: systemctl start mysqld **2.-** Access with the temporary **root** password that will have been generated in the log, the file ''/var/log/mysqld.log'': grep "temporary password" /var/log/mysqld.log With the password that appears, access MySQL server: mysql -u root -p ''Password'': → Enter the password observed with the **grep** command. **3.-** Change the temporary password to ''pandora'' of the **root** user. Remember that the **mysql >** prompt corresponds to the MySQL command interpreter (**MYSQL CLI**): **mysql >** ''ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'Pandor4!';'' **mysql >** ''UNINSTALL COMPONENT 'file://component_validate_password';'' **mysql >** ''ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'pandora';'' **4.-** Create the binary replication user and **root** user for remote connections and cluster administration: **mysql >** ''CREATE USER slaveuser@'%' IDENTIFIED WITH mysql_native_password BY 'pandora';'' **mysql >** ''GRANT REPLICATION CLIENT, REPLICATION SLAVE on *.* to slaveuser@'%';'' **mysql >** ''CREATE USER root@'%' IDENTIFIED WITH mysql_native_password BY 'pandora';'' **mysql >** ''GRANT ALL PRIVILEGES ON *.* to root@'%';'' **5.-** Create Pandora FMS database: **mysql >** ''create database pandora;'' **mysql >** ''use pandora;'' **mysql >** ''source /var/www/html/pandora_console/pandoradb.sql'' **mysql >** ''source /var/www/html/pandora_console/pandoradb_data.sql'' For the **source** command: As long as Pandora FMS console is installed in the same server, //otherwise send this file to the master server//. 6.- Create the pandora user and give the access privileges to this user: **mysql >** ''CREATE USER pandora@'%' IDENTIFIED WITH mysql_native_password BY 'pandora';'' **mysql >** ''grant all privileges on pandora.* to pandora@'%';'' At this point you have the master server ready to start replicating Pandora FMS database. === Database cloning === The next step is to make a clone of the master database (**MASTER**) in the slave node (**SLAVE**). To do so, follow the steps below: **1.-** Make a complete download (dump) of the **MASTER** database: **MASTER** # xtrabackup --backup --target-dir=/root/pandoradb.bak/ **MASTER** # xtrabackup --prepare --target-dir=/root/pandoradb.bak/ **2.-** Get the position of the backup binary log: **MASTER** # ''cat /root/pandoradb.bak/xtrabackup_binlog_info'' It will return something like the following: binlog.000003 157 //Take note of these two values as they are needed for point number 6.// **3.-** Make a copy using rsync with the **SLAVE** server to send the backup: **MASTER** # ''rsync -avpP -e ssh /root/pandoradb.bak/ node2:/var/lib/mysql/'' **4.-** On the **SLAVE** server, configure the permissions so that the MySQL server can access the files sent without any problem: **SLAVE** # ''chown -R mysql:mysql /var/lib/mysql'' **SLAVE** # ''chcon -R system_u:object_r:mysqld_db_t:s0 /var/lib/mysql'' **5.-** Start the ''mysqld'' service on the **SLAVE** server: systemctl start mysqld **6.-** Start the SLAVE mode on this server (use the data noted in point 2): **SLAVE** # ''mysql -u root -ppandora'' **SLAVE** # **mysql >** ''reset slave all;'' **SLAVE** # **mysql >** CHANGE MASTER TO MASTER_HOST='nodo1', MASTER_USER='slaveuser', MASTER_PASSWORD='pandora', MASTER_LOG_FILE='binlog.000003', MASTER_LOG_POS=157; **SLAVE** # **mysql >** ''start slave;'' **SLAVE** # **mysql >** ''SET GLOBAL read_only=1;'' Once you are done with all these steps, if you run the ''show slave status'' command inside the MySQL shell you will notice that the node is set as slave. If it has been configured correctly you should see an output like the following: *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: node1 Master_User: root Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000018 Read_Master_Log_Pos: 1135140 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 1135306 Relay_Master_Log_File: binlog.000018 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: pandora Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1135140 Relay_Log_Space: 1135519 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: fa99f1d6-b76a-11ed-9bc1-000c29cbc108 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0,00 sec) At this point you can be sure that you have binary replication enabled and working correctly. ==== pandora_server configuration ==== Version 770 or later. It is necessary to configure inside [[:en:documentation:pandorafms:installation:04_configuration#server|the pandora_server.conf file]] a series of necessary parameters for the correct operation of the ''pandora_ha''. The parameters to be added are the following: * **ha_mode [pandora|pacemaker]** : For the current ''pandora_ha'' configuration, the token with the ''pandora'' value will be indicated , in case the previous mode is being used (version 769 and earlier), the ''pacemaker'' value will be used. Example: ha_mode pandora * **ha_hosts ,** : Configure the ''ha_host'' parameter followed by the IP addresses or FQDN of the MySQL servers that make up the HA environment. The IP address you put first will have preference to be the **MASTER** server or at least have the master role when you first start the HA environment. Example: ha_hosts 192.168.80.170,192.168.80.172 * **ha_dbuser** and **ha_dbpass** : These are the parameters where you must indicate the user and password of **root** user or otherwise a MySQL user with the maximum privileges that will be in charge of performing all the //master - slave// promotion operations on the nodes. Example: ha_dbuser root ha_dbpass pandora * **repl_dbuser** and **repl_dbpass** : Parameters to define the replication user that will use the **SLAVE** to connect to the **MASTER**. Example: repl_dbuser slaveuser repl_dbpass pandora * **ha_sshuser** and **ha_sshport** : Parameters to define the user/port with which it is connected by **ssh** to the Percona/MySQL servers to perform the recovery operations. For the correct operation of this option, it is necessary to have the ssh keys shared between the user with which the ''pandora_ha'' service is executed and the user indicated in the **ha_sshuser** parameter. Example: ha_sshuser root ha_sshport 22 * **ha_resync**''PATH_SCRIPT_RESYNC'' : By default the script to perform the resynchronization of the nodes, this is located at: /usr/share/pandora_server/util/pandora_ha_resync_slave.sh In the case of having a customized installation of the script, indicate in this parameter its location to perform the automatic or manual synchronization of the **SLAVE** node when needed. ha_resync /usr/share/pandora_server/util/pandora_ha_resync_slave.sh * **ha_resync_log** : Log path where all the information related to the executions performed by the synchronization script configured in the previous token will be stored. Example: ha_resync_log /var/log/pandoraha_resync.log * **ha_connect_retries** : Number of attempts it will perform on each check with each of the servers in the HA environment before making any changes to the environment. Example: ha_connect_retries 2 Once all these parameters are configured, you could start Pandora FMS server with the ''pandora_ha'' service. The server will get an image of the environment and it will know at that moment who is the **MASTER** server. When it knows it, it will create the ''pandora_ha_hosts.conf'' file in the ''/var/spool/pandora/data_in/conf/'' folder, where the Percona/MySQL server that has the **MASTER** role will be indicated at all times. In case the ''incomingdir'' parameter of the ''pandora_server.conf'' file contains a different path (PATH), this file will be located at that PATH. This file will be used as an interchange with the Pandora FMS Console to know at any time the IP address of the Percona/MySQL server with **MASTER** role. * **restart** : It will be indicated with a value of ''0'', since the **pandora_ha** daemon is the one in charge of restarting the service in case of failure, thus avoiding possible conflicts. Example: # Pandora FMS will restart after restart_delay seconds on critical errors. restart 0 === SSH key sharing between servers === An OpenSSH server must be installed and running on each host. Suppress the welcome message or banner that displays OpenSSH, run on **all** devices: [ -f /etc/cron.hourly/motd_rebuild ] && rm -f /etc/cron.hourly/motd_rebuild sed -i -e 's/^Banner.*//g' /etc/ssh/sshd_config systemctl restart sshd Share the SSH keys between ''pandora_ha'' and all the existing Percona/MySQL servers in the environment, **run on Pandora FMS server**: printf "\n\n\n" | ssh-keygen -t rsa -P '' ssh-copy-id -p22 root@node1 ssh-copy-id -p22 root@node2 * In case you have the installation on Ubuntu Server, enable the **root** user to connect via SSH. This is done by generating a password to the root user by executing the ''sudo passwd root'' command. * Then enable the SSH connection of the root user at least through shared keys "**PermitRootLogin without-password**" in the configuration file of the ''sshd'' service. === Using the synchronization script === With Pandora FMS server a script is implemented that allows you to synchronize the **SLAVE** database in case it is out of sync. The manual execution of this script is the following: ./pandora_ha_resync_slave.sh "pandora_server.conf file" MASTER SLAVE For example, to make a manual synchronization from node 1 to node 2 the execution would be the following: /usr/share/pandora_server/util/pandora_ha_resync_slave.sh /etc/pandora/pandora_server.conf node1 node2 To configure the automatic recovery of the HA environment when there is any synchronization problem between **MASTER** and **SLAVE**, it is necessary to have the configuration token ''splitbrain_autofix'' configured to ''1'', inside the server configuration file (''/etc/pandora/pandora_server.conf''). So, whenever a [[:en:documentation:pandorafms:complex_environments_and_optimization:06_ha#split-brain|Split-Brain]] occurs (both servers have the master role) or there is any synchronization problem between **MASTER** and **SLAVE** node, ''pandora_ha'' will try to launch the ''pandora_ha_resync_slave.sh'' script to synchronize from that point the MASTER server status in the SLAVE server. This process will generate events in the system indicating the start, the end and if any error took place in there. ==== Pandora FMS Console Configuration ==== Version 770 or later. A new parameter has been added to the ''config.php'' configuration indicating the path of the exchange directory that Pandora FMS uses by default ''/var/spool/pandora/data_in''. If it is configured, it will look for the file ''/var/spool/pandora/data_in/conf/pandora_ha_hosts.conf'' where it will get the IP address to make the connection. $config["remote_config"] = "/var/spool/pandora/data_in"; In Pandora FMS console you could see the cluster status accessing to the Manage HA view. https://PANDORA_IP/pandora_console/index.php?sec=gservers&sec2=enterprise/godmode/servers/HA_cluster {{ :wiki:pfms-servers-manage_database_ha-view_nodes.png }} The data of this view is constantly updated thanks to ''pandora_ha'', there is no need to do any previous configuration procedure to be able to see this section as long as the ''pandora_server.conf'' is correctly configured with the parameters mentioned in the previous section. Among the available actions, you may configure a label for each one of the nodes and you can perform the option to synchronize the **SLAVE** node through the icon{{:wiki:pfms-servers-manage_database_ha-resync_node_icon.png?nolink&}}. This icon can have the following states: * **Green**: Normal, no operation to be performed. * **Blue**: Pending resynchronization. * **Yellow**: Resynchronization is in progress. * **Red**: //Error//, resynchronization failed. In the **Setup** → **Setup** → **Enterprise**, the **Legacy HA database management** token must be deactivated for the HA view to be displayed with the configuration of this new mode. {{ :wiki:pfms-setup-setup-enterprise-lecgacy_ha_database_management.png }} ==== Corosync-Pacemaker HA environment migration ==== The main difference between an HA environment used in MySQL/Percona Server version 5 and the current HA mode is that ''pandora_ha'' is now used to manage the cluster nodes to the detriment of Corosync-Pacemaker, which will no longer be used from now on. The migration of the environment will consist of: **1.-** Upgrading **Percona** from version 5.7 to version 8.0: "[[:en:documentation:pandorafms:technical_annexes:19_mysql_8|Installation and upgrade to MySQL 8]]". **2.-** Install **xtrabackup-80** on **all** devices: yum install percona-xtrabackup-80 If you use Ubuntu server see section "[[:en:documentation:pandorafms:complex_environments_and_optimization:06_ha#installing_percona_8_on_ubuntu_server|Percona 8 Installation for Ubuntu Server]]". **3.-** Create all users again with the token ''mysql_native_password'' in the **MASTER** node: **mysql** > ''CREATE USER slaveuser@% IDENTIFIED WITH mysql_native_password BY 'pandora';'' **mysql** > ''GRANT REPLICATION CLIENT, REPLICATION SLAVE on *.* to slaveuser@%;'' **mysql** > ''CREATE USER pandora@% IDENTIFIED WITH mysql_native_password BY 'pandora';'' **mysql** > ''grant all privileges on pandora.* to pandora@%;'' **4.-** Dump the database from the **MASTER** node to the **SLAVE** node: **4.1.**- Make the full dump of the **MASTER** database: **MASTER** # xtrabackup –backup –target-dir=/root/pandoradb.bak/ **MASTER** # xtrabackup –prepare –target-dir=/root/pandoradb.bak/ **4.2.-** Get the position of the backup binary log: MASTER # ''cat /root/pandoradb.bak/xtrabackup_binlog_info'' binlog.000003 157 //Take note of these two values as they are needed in point 4.6.// **4.3.-** Synchronize with **rsync** with the **SLAVE** server to send the backup. **SLAVE** # ''rm -rf /var/lib/mysql/*'' **MASTER** # ''rsync -avpP -e ssh /root/pandoradb.bak/ node2:/var/lib/mysql/'' **4.4-** On the **SLAVE** server, configure permissions so that MySQL server can access the sent files without any issues. **SLAVE** # ''chown -R mysql:mysql /var/lib/mysql'' **SLAVE** # ''chcon -R system_u:object_r:mysqld_db_t:s0 /var/lib/mysql'' **4.5.-** Start ''mysqld'' service on the **SLAVE** server. systemctl start mysqld **4.6.-** Start the **SLAVE** mode on this server (use the data from point 4.2): **SLAVE** # ''mysql -u root -ppandora'' **SLAVE** # **mysql** > ''reset slave all;'' **SLAVE** # **mysql** > CHANGE MASTER TO MASTER_HOST='nodo1', MASTER_USER='slaveuser', MASTER_PASSWORD='pandora', MASTER_LOG_FILE='binlog.000003', MASTER_LOG_POS=157; **SLAVE** # **mysql** > ''start slave;'' **SLAVE** # **mysql** > ''SET GLOBAL read_only=1;'' In case you want to install from zero the environment in a new server, in the migration procedure you should only install from zero as the current procedure indicates in the new environment, and in the step of creating the Pandora FMS database you should import the data with a backup of the database of the old environment. At the same time it will be necessary to save in the new environment the Pandora FMS Console and Server configuration indicated in previous sections. ==== Split-Brain ==== Due to several factors, high latencies, network outages, etc., you might find that both **MySQL** servers have acquired the master role and there is no ''autoresync'' option enabled in ''pandora_ha'' so that the server itself chooses the server that will work as master and synchronizes the master node with the slave one, thus losing all the information that could be collected from that server. **To solve this problem data can be merged following this procedure.** This manual procedure **only** covers the data and event retrieval covering the interval between two dates. It assumes that it only recovers data from agents/modules that already exist in the node where the data merge will be performed. If new agents are created during Split-Brain time, or new configuration information (alerts, policies, etc.) **these will not be taken into account**. Only data and events will be retrieved. That is, data related to the ''tagent_data'', ''tagent_data_string'' and ''tevent'' [[:en:documentation:pandorafms:complex_environments_and_optimization:09_pandorafms_engineering#database_main_tables|tables]]. The following commands will be executed in the node that was disconnected (the one to be promoted to **SLAVE**), where ''yyyy-mm-dd hh:mm:ss'' is the Split-Brain start date and time and ''yyyy2-mm2-dd2 hh2:mm2:ss2'' its end date and time. Run **mysqldump** command with appropriate user rights to get a data dump: mysqldump -u root -p -n -t --skip-create-options --databases pandora --tables tagente_datos --where='FROM_UNIXTIME(utimestamp)> "yyyy-mm-dd hh:mm:ss" AND FROM_UNIXTIME(utimestamp) <"yyyy2-mm2-dd2 hh2:mm2:ss2"'> tagente_datos.dump.sql mysqldump -u root -p -n -t --skip-create-options --databases pandora --tables tagente_datos_string --where='FROM_UNIXTIME(utimestamp)> "yyyy-mm-dd hh:mm:ss" AND FROM_UNIXTIME(utimestamp) <"yyyy2-mm2-dd2 hh2:mm2:ss2"'> tagente_datos_string.dump.sql mysqldump -u root -p -n -t --skip-create-options --databases pandora --tables tevento --where='FROM_UNIXTIME(utimestamp)> "yyyy-mm-dd hh:mm:ss" AND FROM_UNIXTIME(utimestamp) <"yyyy2-mm2-dd2 hh2:mm2:ss2"' | sed -e "s/([0-9]*,/(NULL,/gi"> tevento.dump.sql Once the dumps of these tables have been obtained, the data will be loaded to the **MASTER** node: MASTER # ''cat tagente_datos.dump.sql | mysql -u root -p pandora'' MASTER # ''cat tagente_datos_string.dump.sql | mysql -u root -p pandora'' MASTER # ''cat tagente_evento.dump.sql | mysql -u root -p pandora'' After loading the data you retrieved from the node to be promoted to **SLAVE**, proceed to synchronize it using the following procedure: 0.- Delete in MASTER the directory ''/root/pandoradb.bak/''. 1.- Dump the Master DB: MASTER # xtrabackup –backup –target-dir=/root/pandoradb.bak/ MASTER # xtrabackup –prepare –target-dir=/root/pandoradb.bak/ 2.- Get the position of the binary log of the backed up data: MASTER # ''cat /root/pandoradb.bak/xtrabackup_binlog_info'' You will get something like the following (take due note of these values): binlog.000003 157 At this point in SLAVE clear the contents of ''/var/lib/mysql/'' [[#corosync-pacemaker_ha_environment_migration|as indicated in point 4.3 above]]: **SLAVE** # ''rm -rf /var/lib/mysql/*'' 3.- Do a task with the **rsync** command with the slave server to send the backup done. MASTER # ''rsync -avpP -e ssh /root/pandoradb.bak/ node2:/var/lib/mysql/'' 4.- On the slave server, configure permissions so that **MySQL** server can access the files sent without any issues. SLAVE # ''chown -R mysql:mysql /var/lib/mysql'' SLAVE # ''chcon -R system_u:object_r:mysqld_db_t:s0 /var/lib/mysql'' 5.- Start ''mysqld'' service in the slave server. systemctl start mysqld 6.- Start the slave mode on this server. SLAVE # ''mysql -u root -ppandora'' SLAVE # mysql > ''reset slave all;'' SLAVE # mysql > ''CHANGE MASTER TO MASTER_HOST='nodo1', MASTER_USER='slaveuser', MASTER_PASSWORD='pandora', MASTER_LOG_FILE='binlog.000003', MASTER_LOG_POS=157;'' SLAVE # mysql > ''start slave;'' SLAVE # mysql > ''SET GLOBAL read_only=1;'' Once all these steps have been completed, if you run the ''show slave status'' command inside **MySQL** shell, you will see that the node is in slave mode. If it was configured properly you should see an output like the following example: *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: node1 Master_User: root Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000018 Read_Master_Log_Pos: 1135140 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 1135306 Relay_Master_Log_File: binlog.000018 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: pandora Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1135140 Relay_Log_Space: 1135519 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: fa99f1d6-b76a-11ed-9bc1-000c29cbc108 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0,00 sec) At this point you can make sure that you have binary replication enabled and working properly again. [[:en:documentation:start|Go back to Pandora FMS documentation index]]