Pandora FMS Enterprise in High Availability
This post is also available in : Spanish
High Availability in monitoring with Pandora FMS
Pandora FMS is a proactive, advanced, flexible and easy-to-configure monitoring tool according to each business. It gets integrated into all the needs of servers, network computers, terminals and whatever needs to be monitored.
This article applies exclusively to Pandora FMS Enterprise and what is exposed cannot be used in Pandora FMS Community.
In this article, we will focus on Pandora FMS execution considering high availability, to keep the solution working in the event of a server failure, bearing in mind that the monitoring tool is in charge of sending alerts in the event of failure of any datacenter component.
Something essential to have any solution in high availability is to have at least two components of each solution component so that, in the event of any component failure, Pandora FMS keeps on taking metrics, sending alerts, showing health dashboards or any settings that are important to your implementation. In order to understand this architecture a little bit, you can see the basic working overview for this type of configuration.
This scenario will use virtual machines running on two virtualization servers. You can use VMware Esx, HyperV, KVM or the virtualization platform that supports CentOS 7 virtualization.
|PandoraServer01||Pandora Server, Pandora Console||Server HA|
|PandoraServer02||Pandora Server, Pandora Console||Server HA|
|PandoraDB01||Percona DB||DB HA|
|PandoraDB02||Percona DB||DB HA|
|PandoraDB03||Percona DB Historical Data||History|
In order to understand how the high availability solution works, we are going to describe the three components that appear in the table (Detail) to understand how Pandora FMS works in a critical-mission scenario. An example could be an ISP (Internet Provider). Once this is clear, we will see how the solution works
HLB: Load balancer. This component fulfills a very important function, which is to generate a Virtual IP (VIP). We will see this point later when validating the state of each Pandora FMS Enterprise component, checking that the server is active and sending load -in case one of the servers is not available- to another. The detail of the ports configured in the balancer is as follows:
- 80 TCP – http (WEB Console).
- 443 TCP – https (WEB Console with SSL Certificate).
- 41121 TCP – Tentacle (It receives metrics from the agents).
- 162 UDP – SNMP Traps (It receives alerts from SNMP devices).
Server HA: They are two Pandora FMS consoles that execute all components to take agent metrics and all the components that are part of the solution, and the web application to operate the solution.
DB HA: Pandora FMS works with a Percona database, which supports high availability execution with the master-slave pattern, which replicates data and allows, from a Virtual IP (VIP), to keep the solution operational in the event of failure of one of Pandora FMS database servers.
History: Historical data is saved. If necessary, the historical DB can be built with two High Availability Percona servers with DB HA.
Complete example of a High Availability scenario in Pandora FMS Enterprise
After an overall description of what a High Availability scenario is like, let’s see a full example. Before starting, let’s see the working diagram.
|Flotante LB||192.168.200.10||VIP Serve|
|Flotante DB||192.168.200.20||VIP DB HA|
Pandora FMS and HA database health
Something fundamental in High Availability configuration is the health of every single component. Pandora FMS Enterprise has integrated Percona Database replica monitoring, that allows -in addition to finding out the Health status of the database- to perform corrective operations without the need to have a profile with advanced knowledge, as seen in the following image.
Pandora FMS Server and its components
In the following image, you may see Pandora FMS Enterprise servers running on both servers.
Part of Pandora FMS Enterprise configuration is saved in the Percona database. These changes are reflected on both servers immediately.
Updates and changes
It is very important to bear in mind that, since it is a modular solution, all components must have the same version. At the time of updating, it is done on all components simultaneously, as follows:
That way you are updated in this scenario, where a high SLA can be provided using Pandora FMS strengths.
Pandora FMS has a configuration in the file
When making any changes, it is important to do them on both consoles and restart the pandora_server service so that everything works properly.
Files shared by both consoles
For all changes to be reflected in both consoles, a storage can be added and the following folders can be built through NFS, where both consoles can write simultaneously.
|/var/spool/pandora/data_in/md5||Hash Agent Configuration|
With this configuration you will have Pandora FMS Enterprise running in High Availability. More details about these settings in our official documentation.
Pandora FMS Enterprise in High Availability
After seeing how Pandora FMS Enterprise implementation is carried out in High Availability, let’s get into a little more detail about what the benefits are, what it allows you and essentially how it works. Before seeing in detail how Pandora FMS in High Availability works, we are going to take a closer look at two very important terms in order to better understand this type of architecture.
What is a high availability architecture?
A high availability solution, known by its acronym HA, is applied when you wish to have a contingency plan on any component that faces an anomalous situation, in order to keep on providing services.
What does it mean for an Architecture to offer scalability?
Scalability is understood as the adaptability and response of a system regarding its performance as its users increase significantly. Although it seems like a clear concept, system scalability is a complex and important design aspect.
The big difference between a High Availability architecture and a high scalability architecture is the number of computers that make it up. To have High Availability, you may just have two components of each of its parts; in a highly scalable solution, N components will be required depending on the load. But in a practical way, the way they were conceived is the same.
In Pandora FMS Enterprise, depending on the architecture, you will have all Pandora FMS servers running on a single computer, as you may see in the following diagram:
In Pandora FMS Architecture, actions carried out by consoles, such as Plugin Server and Network Server, are actions that are executed from Pandora FMS Enterprise servers towards monitored devices. Since the configuration is stored in the database, in the event of a server failure, actions will be executed from the one that works correctly.
Access to the console, SNMP traps or agents reporting to TCP port 41121 (Tentacle) are balanced by the balancer that allows deciding among different balancing strategies, such as giving priority to a server, distributing the load while keeping sessions, etc.
Let’s see a graphic example of the failure of one of Pandora FMS Enterprise servers:
As explained previously, the components that are balanced will be redirected by the load balancer and Pandora FMS Enterprise servers natively.
Pandora FMS server health view: in this case one of the servers is not working but the solution is still operational even with this server with problems.
From the web console, you will see messages informing that one of the servers stopped working:
It is important to know that Pandora FMS will keep on working properly, even if it takes several days to solve the server problem or even if it is necessary to reinstall it.
Pandora FMS Enterprise database in HA. How does it work?
Finally, and bearing in mind that Pandora FMS Enterprise database is the heart of the product, we are going to understand how the architecture works so that it is always available for Pandora FMS servers.
From the Web console, in Servers -> Manage Database HA
You can see the health status of your database in High Availability
You can see the virtual IP (VIP) 172.16.75.21, the IP Pandora FMS Enterprise connects to to work with the database; the server that has the Master role is the one that saves data and is replicated on the server with the Slave role. It is essential to make sure this replica works properly to avoid data loss.
In the diagram you can see that there’s a failure in the second database server; in that case, the replica will be lost and will automatically adjust when it becomes available again.
If the failure takes place on the server with the Master role, the server with the Slave role becomes Master; Pandora FMS Enterprise loses the connection for a short period of time and is available again.
To finish this article, it is important to know that Pandora FMS provides database server management tools that allow the Master server to be re-synchronized with the Slave, to set a server on Standby and to be able to configure alerts in the event of any failure in this critical component of Pandora FMS Enterprise running on a High Availability architecture.
Before saying goodbye, remember that you can learn everything that Pandora FMS can offer you by clicking here .
Moreover, if you have to monitor more than 100 devices, you may also enjoy a FREE 30-day Pandora FMS Enterprise TRIAL . Get it here .
Last but not least, remember that if you have a small number of devices to monitor, you may use Pandora FMS OpenSource version. Find more information here .
Socio Fundador y CEO de SITS SOLUCIONES.
Emprendedor inquieto, Tecnólogo de formación e innovador por vocación, cuenta con más de 25 años de experiencia en el universo IT. Transitando la transformación digital y creando soluciones innovadoras, actualmente enfocado en Monitoreo Pro-Activo, Cloud y Alta disponibilidad. Realiza contribuciones a las comunidades de Pandora FMS para Argentina, Chile y Uruguay.
Founder partner and CEO of SITS SOLUCIONES.
Restless entrepreneur, training technologist and innovator by vocation, has more than 25 years of experience in the IT universe. Moving the digital transformation and creating innovative solutions, currently focused on Pro-Active Monitoring, Cloud and High Availability. He makes contributions to Pandora FMS communities for Argentina, Chile and Uruguay.