Table of Contents

Pandora FMS Engineering

Pandora FMS Engineering Details

This section explains some of the design principles and particularities of Pandora FMS.

Pandora FMS Database Design

Pandora FMS first versions, from version 0.83 to version 1.1, were based on a very simple idea: one piece of data, one insertion in the database. This allowed the program to perform simple searches, insertions and other operations.

Although this development had some advantages, there was a big disadvantage: scalability (fast growth without affecting or slightly affecting operations and work routines). This system has an specific limit regarding the maximum amount of modules supported, and when having a significant amount of data (> 5 millions of elements), the performance level decreased.

On the other hand, solutions based on MySQL cluster are not easy: even though they allow managing a higher load, they entail some minor problems. They do not offer a long term solution either to this performance problem with higher data load.

The current version of Pandora FMS implements a data compression in real time for each insertion. It also allows data compression based on interpolation. The maintenance task also allows automatic deletion for those data that exceed a certain period of time.

The new Pandora FMS processing system keeps only «new» data. If a duplicated value enters the system, it will not be stored in the database. It is very useful to keep the database to a minimum and it works for all Pandora FMS modules: numeric, incremental, boolean,and string. In the boolean data type, the compressing index is very high, since they are data that rarely change. Nevertheless, the «index» elements are stored every 24 hours, so there is minimum information that is used as a reference when compacting the information.

This system solves part of the scalability problem, reducing database usage by 40%-70%, but there are other ways to increase scalability.

Pandora FMS allows component breakup to balance the data file processing load and network module execution in different servers. It it possible to have several Pandora FMS servers (network servers, data or SNMP), Pandora FMS Web consoles, and also a database or a high performance cluster (with MySQL5) in different servers.

The adjustments imply big changes when reading or interpreting data. In latest Pandora FMS versions, the graphic engine has been redesigned from scratch to be able to represent data quickly with the new data storage model.

Compressing processes have certain implications when reading and interpreting data graphically. Imagine an agent cannot communicate with Pandora FMS, so the Pandora FMS server does not receive data from that agent, and there is a period of time during which the server has no information from said agent's modules. If you access the graphic of one of those modules, the interval with no data will be represented as not suffering any changes, as a horizontal line. If Pandora FMS does not receive new values, it will assume there were no changes and everything will look as it did in the last notification.

To see a graphic example, this image shows the changes for each data, received every 180 seconds.

module_graph_full.jpg

This would be the equivalent graphic for the same data, expect for a connection failure, from 05:55 to 15:29 approximately.

module_graph_peak.jpg

In Pandora FMS 1.3 a new general graphic for the agents was added. It shows its connectivity, and the access rate from the module to the agent. This graphic complements the other graphs that are shown when the agent has activity and receives data. This is an example of an agent that had activity and received data:

access_graph_full.jpg

If it has peaks (low) in this graphic, there could be some problems or slow connections in the Pandora FMS agent connectivity with the Pandora FMS server, or either connectivity problems from the network server.

In Pandora FMS version 5, it was introduced the possibility of crossing the data of the “unknown module” type events with the graphs, to show in the graph the piece of data in unknown status, complementing the graph for better understanding, for example:

grafica-dsconocido.jpg

In version 7 NG 759 it had a graph configuration menu that allowed adding percentiles, data in real time, when events and/or alerts took place, in addition to other options.

Other DB technical aspects

Throughout software updates, small improvements have been made to the relational model of Pandora FMS database. One of the changes is information indexation by module types. That way, Pandora FMS can access information more quickly since it is broken down into different tables.

Tables can be partitioned (by timestamps) to improve even more data historie access performance.

In addition, factors such as numerical representation of timestamps (in _timestamp_ UNIX format), speeds up date range searches, their comparison, etc. This work has allowed a significant improvement in search times and insertions.

Database Main Tables

You may get more information about Pandora FMS database structure (as of April 18, 2020) by following this link .

This is an ER diagram and also a detailed description of the main tables of Pandora FMS database.

Click on the image to enlarge

Real time data compression

To avoid overloading the database, the server performs a simple compression at the time of insertion. One piece of data will not be stored in the database, unless it is different from the previous one or there is a time difference of 24 hours between them.

For example, for an interval of around 1 hour, then the sequence 0,1,0,0,0,0,0,0,1,1,0,0 is saved in the database as 0,1,0,1,0. A consecutive 0 will not be saved unless 24h have passed.

The graph shown here has been drawn from the data of the previous example. Only the data in red have been inserted in the database.

Compression affects data processing algorithms, both to metrics and graphs. And it is important to keep in mind that the blanks caused by the compression must be filled in.

Considering all of this, in order to calculate based on the data of a module, with a certain interval and the starting data, you should follow these steps:

Data compression

Pandora FMS now includes a system to compress database information. This system is focused on small / medium-sized deployments (250-500 agents, < 100,000 modules) which must have a wide information history but loosing some details.

Pandora FMS database maintenance, which is executed each hour, compacts old data among other cleaning tasks. Compression is done using a simple linear interpolation, which means that if there are 10,000 data in a single day, the process will reduce those 10,000 points to 1000 points.

Since it is an interpolation, some details are missing in that information, but it is enough for monthly, yearly, etc., reports and graphs.

In big databases, this could be “expensive” in terms of database performance, and it should be disabled. You may use the history database model instead.

History database

Versión EnterpriseThis is an Enterprise feature, and it is used to store old information that is not used in daily views, for example, one-month-old data. These data are automatically migrated to a different database that must be in a different server with a hard drive different to that of the main database.

When a graph or report containing old data is shown, Pandora FMS will look for the first days in the main database, and when reaching the point when data are migrated to the history database, it will search there. Thanks to that, performance is optimized even when storing a high amount of information within the system.

To configure this, install the history database manually in another server (importing the Pandora FMS schema, without data), and permissions to allow access to it from the main Pandora FMS server.

Go to SetupSetupHistorical database and configure there the settings to access the history database.

Advanced setup

The default Pandora FMS configuration does NOT transfer string data to the historical database, however, if this configuration has been modified and the historical database is receiving this type of information, it is essential to configure its purging, since otherwise it will end up taking too much time, causing big problems, besides having a negative impact on performance.

To configure this parameter, must run a query directly in the database to set the days after which this information will be purged. The key table here is t_config and the field is string_purge. If you wanted to set 30 days for the purging of this type of information, this query should be run directly on the historical database:

UPDATE tconfig SET value = 30 WHERE token = "string_purge";

The database is maintained by a script named pandora_db.pl:

A good way to test whether database maintenance is correctly executed is running the script manually:

/usr/share/pandora_server/util/pandora_db.pl /etc/pandora/pandora_server.conf

It should not report any error. If another instance is using the database, you may use the -f option that forces the execution; with the -p parameter you do not compact the data. This is especially useful in High Availability environments with historical databases, as the script makes sure that the necessary steps for these components are performed in the correct order and mode.

Pandora FMS module status

In Pandora FMS, modules can have different status: Unknown, Normal, Warning, Critical or with Triggered Alerts.

When is each status set?

Spreading and priority

Within Pandora FMS organization, some elements depend on others, as for example agent modules or group modules. These equally applies to Pandora's FMS Enterprise policies, which have certain agents and modules associated that are considered to be associated to each agent.

This structure is specially useful for evaluating module status at a glance. This is achieved by spreading upwards in the organization schema the status, granting that status to agents, groups and policies.

Which status does the agent have?

An agent will have the worst of its modules's status. At the same time, a group will have the worst of the its agent's status, and the same for policies, which will have the worst status of its assigned agents.

That way, by seeing a group with a critical status, for example, you will known that at least one of its agents has the same status. To locate it, go down to the next level, to that of the agents to narrow down the module or modules that caused the spreading of the critical status.

Which should be the status priority?

When it is said the worst status is spread, it must be clear which status are the most important ones. Therefore, there is a priority list. In there the first status has higher priority over the others and the last one the one that has the lowest. This one will be shown only when all elements have it.

  1. Triggered Alerts.
  2. Critical status.
  3. Warning status.
  4. Unknown status.
  5. Normal status.

You may see that when a module has triggered alerts, its status has priority over the rest, and the agent and group it belongs to will have this status.

On the other hand, in order for one group to be in normal status, all its agents must have that status; which implies that all the group modules will be in normal status.

Color Code

Each one of the status mentioned has a color assigned, in order to to easily see in the network maps when something does not work properly.

Fired alerts status

Critical status

Warning status

Unknown status

Normal status

Pandora FMS graphs

Graphs are one of the most complex Pandora FMS implementations, because they gather information in real-time from the DB, and no external system is used (RRDtool or similar).

There are several graph performances according to the type of source data:

Compression

Compression affects how graphics are represented. When two pieces of data with the same value are received, Pandora FMS does not store the last on. If when we are representing a graph, there is no reference value, Pandora FMS will searches until 48 hours back in time to find the last known value to take as reference. If it does not find anything, it will start from 0.

In asynchronous modules, although there is no compression, the backwards search algorithm behaves similarly.

Interpolation

When making a graph, Pandora FMS takes 50xN samples (N is the graph resolution factor, which can be configured in the setup. It is 3 by default). For example, a monitor that returns data every 300 seconds (5 minutes) will generate 12 samples per hour, and 288 samples (12*24) a day. So a day graph would not represent 288 values, but rather they are “compresses” into only 150 (50*3) samples.

This means some resolution is lost. And the more samples you have, the more you will lose. But this can be avoided by creating the graphic with a different interval or zoom.

In normal graphs, interpolation is implemented in a simple way: if there are two samples withing an interval (e.g.: interval B of the example), the average will be calculated and representes.

In boolean graphs, if there are several data within a sample (in this case only 1 or 0), 0 will be shown. This helps to visualize failures within an interval, giving it priority over the normal status.

In both cases, if there is no data within a sample (because it is compressed or because it is missing), the last known value of the previous interval will be used, like interval E of the example above.

Avg/Max/Min

The graphs show the average, maximum and minimum values by default. Because a sample can have several data, average, maximum or minimum data values will appear. The more interpolation needed (the longer the display period is and the more data you have), the higher the interpolation level will be and therefore, the difference between maximum and minimum values will be higher.

If the graph range is low (an hour or so), there will not be any interpolation, or it will be minimum, so you will see the data with its real resolution, and the three series will be identical.

Go back to Pandora FMS documentation index