Difference between revisions of "Pandora: Documentation en: PandoraFMS Engineering"

From Pandora FMS Wiki
Jump to: navigation, search
(Data compression)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
[[Pandora:Documentation_en|Go back to documentation index]]
 
[[Pandora:Documentation_en|Go back to documentation index]]
  
{{WIP}}
 
  
 
= Pandora FMS Engineering Details =
 
= Pandora FMS Engineering Details =
Line 316: Line 315:
 
=== History database ===
 
=== History database ===
  
This is an Enterprise feature, and is used to store the information from a given point in time, for example, data with more than one month in a different database. This database '''must be''' in a different physical server (no virtualize here, please!). Automatically, when you request a data graph for 1 year, Pandora FMS will look the first XX days in the "realtime/main" database and the other information in the history database. In this way you can avoid to have performance penalties when you store a huge ammount of information in your system.
+
This is an Enterprise feature, and it is used to store old information that is not used in daily views, for example, one-month-old data. These data are automatically migrated to a different database that '''must''' be in a different physical server (not virtual) with a hard drive different to that of the main database.
 
<br><br>
 
<br><br>
To configure this, you need to setup manually in another server, a history database (importing the Pandora FMS DB Schema into it, without data), and setup permissions to allow access to it from the main Pandora FMS server.
+
When a graph or report containing old data is shown, Pandora FMS will look for the first days in the main database, and when reaching the point when data are migrated to the history database, it will search there. Thanks to that, performance is optimized even when storing a high amount of information within the system.
  
Go to Setup -> History database and configure there the settings to access the history database.
+
To configure this, install the history database manually in another server (importing the Pandora FMS schema, without data), and permissions to allow access to it from the main Pandora FMS server.
 +
 
 +
Go to ''Setup'' > ''History database'' and configure there the settings to access the history database.
  
 
<br>
 
<br>
Line 330: Line 331:
 
<br>
 
<br>
  
Some settings interesting which need to be explained:
+
Here are some interesting settings explained:
  
* '''Days''': max days information is stored in main database. After that date, data will be moved to history db. 30 days is a good default.
+
* '''Days''': Max amount of days the information is stored in the main database. After that date, data will be moved to the history database. 30 days is a good option.
  
* '''Step''': This acts like a buffer, database maintance script, will take XX registers from database, will insert it in the history database and will delete it from main database. This is timeconsuming, and size depends on your setup, 1000 is a good default value.
+
* '''Rate''': This works like a buffer. The database maintenance script, will take X registers from database, add them to the history database and will delete them from the main database. This is time consuming, and size depends on your console setup. 1000 is a good default value.
  
* '''Delay''': After a block of ''step'' modules, script will wait for ''delay'' seconds. Useful if your database performance is poor, to avoid locks. Use values only between 1-5.
+
* '''Delay''': After a block of a series of modules, the script will wait for the amount of time specified in ''delay''(seconds). It is useful if your database performance is poor, to avoid overloads. Use values between 1-5.
  
The default configuration of Pandora FMS does NOT transfer string type data to the historical database, however, if we have modified this configuration and our historical database is receiving this type of information '''it is essential that we configure its purging''' otherwise it will end up occupying too much time, causing big problems, besides having a negative impact on the performance.
+
The default Pandora FMS configuration does NOT transfer string data to the historical database, however, if this configuration has been modified and the historical database is receiving this type of information, '''it is essential to configure its purging''', since otherwise it will end up taking too much time, causing big problems, besides having a negative impact on performance.
  
To configure this parameter we must run a query directly in the database to determine the days after which this information will be purged. The table we are interested in is ''tconfig'' and the field ''string_purge''. If we wanted for example to set 30 days for the purging of this type of information, for example, '''we would run the next query directly on the historical database''':
+
To configure this parameter, must run a query directly in the database to set the days after which this information will be purged. The key table here is ''tconfig'' and the field is ''string_purge''. If you wanted to set 30 days for the purging of this type of information, '''this query should be run directly on the historical database''':
  
 
  UPDATE tconfig SET value = 30 WHERE token = "string_purge";
 
  UPDATE tconfig SET value = 30 WHERE token = "string_purge";
  
A good way to test this is to run the database maintance script manually:
+
A good way to test whether database maintenance is correctly executed is running the script manually:
  
 
  ''/usr/share/pandora_server/util/pandora_db.pl /etc/pandora/pandora_server.conf''
 
  ''/usr/share/pandora_server/util/pandora_db.pl /etc/pandora/pandora_server.conf''
  
There shouldn't be any reported error.
+
It should not report any error.
  
== Status of The Modules in Pandora FMS ==
+
== Pandora FMS module status ==
  
In Pandora FMS the modules can have different status: Unkown, Normal, Warning, Critical or with Fired Alerts.     
+
In Pandora FMS, modules can have different status: Unknown, Normal, Warning, Critical or with Triggered Alerts.     
 
    
 
    
=== When is Each Status Set?===  
+
=== When is each status set?===  
  
Each module has the Warning and Critical thresholds set in its configuration. These thresholds define its data values for which these status will be activate. If the module gives data out of these thresholds, then it will be considered that it's on Normal status.
+
Each module has Warning and Critical thresholds set in its configuration. These thresholds define its data values for which these status will be activated. If the module gives data out of these thresholds, then it will be considered to be in Normal status.
  
Each module has also a time interval that will fix the frequency with which it will get the data. This interval will be taken into account by the console to collect data. If the module has the double of its interval without collecting data, then, it'll be considered that this module is in Unknown status.
+
Each module also has a time interval that will set the frequency with which it will get data. This interval will be taken into account by the console to collect data. If the module has not collected data for twice its interval, this module will be considered to be in Unknown status.
  
Finally, if the module has configured alerts and any of them have been fired and have not been validated, then the module will have the corresponding Fired Alert status.
+
Finally, if the module has alerts configured and any of them has been triggered but not validated, then the module will have the corresponding Triggered Alert status.
  
=== Spreading and Priority ===  
+
=== Spreading and priority ===  
  
In Pandora's organization, some elements depend on others, as for example the modules of one agent or the agents of one group.These can also be applied to the case of the Pandora's FMS Enterprise policies, which have associated some agents and some modules that are considered associated to each agent.
+
Within Pandora FMS organization, some elements depend on others, as for example agent modules or group modules. These equally applies to Pandora's FMS Enterprise policies, which have certain agents and modules associated that are considered to be associated to each agent.
  
This structure is specially useful in order to evaluate easily the status of the modules. This is obtained spreading up the status in this organization, giving status to the agents, groups and policies this way.
+
This structure is specially useful for evaluating module status at a glance. This is achieved by spreading upwards in the organization schema the status, granting that status to agents, groups and policies.
  
  
==== Which status will an Agent have? ====
+
==== Which status does the agent have? ====
  
An agent will have the worst of its modules's status. Recursively, a group will have the worst of the agent's status that belong to it, and the same for the policies, that will have the worst status of its assigned agents.
+
An agent will have the worst of its modules's status. At the same time, a group will have the worst of the its agent's status, and the same for policies, which will have the worst status of its assigned agents.
  
This way, by seeing one group with a critical status, for example, we'll known that at least one of its agents has the same status. When we locate it, we could get down another level to get to the module or modules that have caused the spreading of the critical status to the upper level.
+
That way, by seeing a group with a critical status, for example, you will known that at least one of its agents has the same status. To locate it, go down to the next level, to that of the agents to narrow down the module or modules that caused the spreading of the critical status.
  
  
==== Which should be the Priority of the status?====
+
==== Which should be the status priority?====
  
When we say that the worst of the status is spread, we should be sure which status are the most important ones. This way, there is a priority list, being the first status in it the one that has highest priority over the others and the last one the one that has the lowest. This one will be shown only with all elements have it.
+
When it is said the worst status is spread, it must be clear which status are the most important ones. Therefore, there is a priority list. In there the first status has higher priority over the others and the last one the one that has the lowest. This one will be shown only when all elements have it.
  
 
<div style='margin-left: 200px; text-align: left;'>
 
<div style='margin-left: 200px; text-align: left;'>
# Fired Alerts
+
# Triggered Alerts
 
# Critical status
 
# Critical status
 
# Warning status
 
# Warning status
Line 389: Line 390:
  
  
We can see that when a module has fired alerts, its status has priority over the rest, and the agent to which it belongs will have this status and also the group to which this agent belongs to. On the other hand, in order to one group, for example, has a normal status, all its agents should have this status; which implies that all the modules of these groups will have normal status.
+
You may see that when a module has triggered alerts, its status has priority over the rest, and the agent and group it belongs to will have this status.
 +
 
 +
On the other hand, in order for one group to be in normal status, all its agents must have that status; which implies that all the group modules will be in normal status.
  
 
=== Color Code===
 
=== Color Code===
  
Each one of the commented status has a color assigned, in order to could view in the network maps, with a quick view, when something isn't working properly.
+
Each one of the status mentioned has a color assigned, in order to to easily see in the network maps when something does not work properly.
  
 
<div style='margin-left: 200px; text-align: left;'>
 
<div style='margin-left: 200px; text-align: left;'>
Line 406: Line 409:
 
== Pandora FMS graphs ==
 
== Pandora FMS graphs ==
  
Graphs are one of the most complex implementations on Pandora FMS, because they gather information in real-time from the DB, and no external system is used (rrdtool or similar).
+
Graphs are one of the most complex Pandora FMS implementations, because they gather information in real-time from the DB, and no external system is used (rrdtool or similar).
  
There are several behaviors of the graphs that depend on the type of the data:
+
There are several graph performances according to the type of source data:
  
* '''Asynchronous modules'''. It is assumed that there is no data compaction. Data stored in the DB are all the real samples of the data (therefore, no compaction). It produces more "exact" graphs without possible misinterpretation.
+
* '''Asynchronous modules'''. It is assumed that there is no data compaction. Data stored in the DB are all the real samples of the data (therefore, no compaction). It creates more "accurate" graphs without possible misinterpretation.
* '''Text string modules'''. Shows the rate of the gathered data.
+
* '''Text string modules'''. They show the rate of the collected data.
 
* '''Numerical modules'''. Most modules report such data.  
 
* '''Numerical modules'''. Most modules report such data.  
* '''Boolean modules'''. This are numerical data on *PROC modules: for instance, ping checks, interface status, etc. 0 means wrong, 1 means "Normal". They raise events automatically when they change of status.
+
* '''Boolean modules'''. This are numerical data on *PROC modules: for instance, ping checks, interface status, etc. 0 means wrong, 1 means "Normal".
 +
 
 
=== Compression ===
 
=== Compression ===
  
Compression affects on how the graphics are represented. When we receive two data with the same value, Pandora does not store the last data, but interprets that the last known value can be used for the present time if we don't have another value. When we are painting a graph, if we do not have a reference value just when the graphic starts, Pandora searches 48 hours back in time to find the last known value to take as reference. If it doesn't find anything, it will start from 0.
+
Compression affects how graphics are represented. When two pieces of data with the same value are received, Pandora FMS does not store the last on. If when we are representing a graph, there is no reference value, Pandora FMS will searches until 48 hours back in time to find the last known value to take as reference. If it does not find anything, it will start from 0.
  
In asynchronous modules, although there are not compression, the backwards search algorithm behaves similar.
+
In asynchronous modules, although there is no compression, the backwards search algorithm behaves similarly.
  
 
=== Interpolation ===
 
=== Interpolation ===
  
When composing a graph, Pandora takes 50xN samples, being N the resolution factor of the graphs (this value can be configured in the setup). A monitor that gathers data every 300 seconds (5 minutes) will have 12 samples per hour, and 12x24 = 288 samples in a day. So when we ask a graph of a day, we are not printing  288 values, we are "compressing" or interpolating the graphic using only 50x3=150 samples (by default, graph resolution in Pandora is 3).
+
When making a graph, Pandora FMS takes 50xN samples (N is the graph resolution factor, which can be configured in the setup. It is 3 by default). For example, a monitor that returns data every 300 seconds (5 minutes) will generate 12 samples per hour, and 288 samples (12*24) a day. So a day graph would not represent 288 values, but rather they are "compresses" into only 150 (50*3) samples.
  
This means that we lose some resolution and the more samples. When we have a lot of values, for instance the 2016 samples of a week, of 8400 samples of a month, we must compress them in the 150 samples of a graph. This is why sometimes we lose detail and do not see some details, that's why the graphs can be queried with different intervals and to zoom in or out.
+
This means some resolution is lost. And the more samples you have, the more you will lose. But this can be avoided by creating the graphic with a different interval or zoom.
  
 
<center>
 
<center>
Line 430: Line 434:
 
</center>
 
</center>
  
In the '''normal graphs''', the interpolation is implemented in a simple way: if withing an interval we have two samples (p.e: interval B of the example), we do the average and we draw its value.
+
In '''normal graphs''', interpolation is implemented in a simple way: if there are two samples withing an interval (e.g.: interval B of the example), the average will be calculated and representes.
  
In '''boolean graphs''', if within a sample we have several data (we can only have 1 or 0), we take the pessimist approach, and draw 0. This helps for the visualization of failures within an interval, having priority showing the problem that the normal status.
+
In '''boolean graphs''', if there are several data within a sample (in this case only 1 or 0), 0 will be shown. This helps to visualize failures within an interval, giving it priority over the normal status.
  
In both cases, if within a sample we don't have any data (because it's compressed or because it's missing), we will use the last known value of the previous interval to show the data, like the interval E of the above example shows.
+
In both cases, if there is no data within a sample (because it is compressed or because it is missing), the last known value of the previous interval will be used, like interval E of the example above.
  
 
=== Avg/Max/Min ===
 
=== Avg/Max/Min ===
Line 442: Line 446:
 
</center>
 
</center>
  
The graphs by default show the average, maximum and minimum values. Because a sample (see ''interpolation'') can have several data, we show the average values of the data, the maximum or the minimum. The more interpolation needed (the longer the period we are visualizing and we have considerably more data), the higher the interpolation level will be, therefore the difference between maximum and minimum values will be greater. The lower the range of the graph (an hour or so), there will not be interpolation, or it will be minimum, so we'll see the data with its ''real'' resolution, and the three series will be identical.
+
The graphs show the average, maximum and minimum values by default. Because a sample can have several data, average, maximum or minimum data values will appear. The more interpolation needed (the longer the display period is and the more data you have), the higher the interpolation level will be and therefore, the difference between maximum and minimum values will be higher. If the graph range is low (an hour or so), there will not be any interpolation, or it will be minimum, so you will see the data with its ''real'' resolution, and the three series will be identical.
  
  
 
[[Pandora:Documentation_en|Go back to documentation index]]
 
[[Pandora:Documentation_en|Go back to documentation index]]
 
[[Category: Pandora FMS]]
 
[[Category: Pandora FMS]]

Latest revision as of 08:13, 27 February 2020

Go back to documentation index


1 Pandora FMS Engineering Details

This section explains some of the design principles and particularities of Pandora FMS.

1.1 Pandora FMS Database Design

Pandora FMS first versions, from version 0.83 to version 1.1, were based on a very simple idea: one piece of data, one insertion in the database. This allowed the program to perform simple searches, insertions and other operations.

Although its development had some advantages, there was a big disadvantage: scalability. This system has an specific limit regarding the maximum amount of modules supported, and when having a significant amount of data (> 5 millions of elements), performance level decreased.

On the other hand, solutions based on MySQL cluster are not easy: even though they allow managing a higher load, they entail some minor problems. They do not offer a long term solution either to this performance problem with higher data load.

The current version of Pandora FMS implements a data compression in real time for each insertion. It also allows data compression based on interpolation. The maintenance task also allows automatic deletion for those data that exceed a certain period of time.

The new Pandora FMS processing system keeps only «new» data. If a duplicated value enters the system, it will not be stored in the database. It is very useful to keep the database to a minimum and it works for all Pandora FMS modules: numeric, incremental, boolean,and string. In the boolean data type, the compressing index is very high, since they are data that rarely change. Nevertheless, the «index» elements are stored every 24 hours, so there is minimum information that is used as a reference when compacting the information.

This system solves part of the scalability problem, reducing database usage by 40%-70%, but there are other ways to increase scalability. Pandora FMS allows component breakup to balance the data file processing load and network module execution in different servers. It it possible to have several Pandora FMS servers (network servers, data or SNMP), Pandora FMS Web consoles, and also a database or a high performance cluster (with MySQL5) in different servers.

The adjustments imply big changes when reading or interpreting data. In latest Pandora FMS versions, the graphic engine has been redesigned from scratch to be able to represent data quickly with the new data storage model. Compressing processes have certain implications when reading and interpreting data graphically. Imagine an agent cannot communicate with Pandora FMS, so the Pandora FMS server does not receive data from that agent, and there is a period of time during which the server has no information from said agent's modules. If you access the graphic of one of those modules, the interval with no data will be represented as not suffering any changes, as a horizontal line. If Pandora FMS does not receive new values, it will assume there were no changes and everything will look as it did in the last notification.

To see a graphic example, this image shows the changes for each data, received every 180 seconds.

Module graph full.jpg

This would be the equivalent graphic for the same data, expect for a connection failure, from 05:55 to 15:29 approximately.

Module graph peak.jpg

In Pandora FMS 1.3 a new general graphic for the agents was added. It shows its connectivity, and the access rate from the module to the agent. This graphic complements the other graphs that are shown when the agent has activity and receives data. This is an example of an agent that is regularly connected to the server:

Access graph full.jpg

If you have peaks (low) in this graphic, there could be some problems or slow connections in the Pandora FMS agent connectivity with the Pandora FMS server, or either connectivity problems from the network server.

From Pandora FMS version 5 onwards, a new feature was added, which makes possible to cross the data of the "unknown module" type events with the graphs, to show in the graph the piece of data in unknown status, complementing the graph for better understanding, for example:


Grafica-dsconocido.jpg


1.1.1 Other DB technical aspects

Throughout software updates, small improvements have been made to the relational model of Pandora FMS database. One of the changes is the indexation by module types. That way, Pandora FMS can access information more quickly since it is broken down into different tables. Tables can be partitioned (by timestamps) to improve even more data history access performance.

In addition, factors such as numerical representation of timestamps (in _timestamp_ UNIX format), speeds up date range searches, their comparison, etc. This work has allowed a significant improvement in search times and insertions.

1.1.2 Database Main Tables

This is an ER diagram and also a detailed description of the main tables of Pandora FMS database.

Pandora db eer.png

  • taddress: It contains agent additional addresses.
  • taddress_agent: Addresses linked to an agent(rel. taddress/tagent).
  • tagente: It contains the information of Pandora FMS agents.
    • id_agent: Agent unique identifier.
    • name: Agent name (case sensitive).
    • address: Agent address. It is possible to assign additional addresses through taddress.
    • comentarios: Free text.
    • id_group: Identifier of the group the agent belongs to (ref. tgrupo).
    • last_contact: Last agent contact date, either through a software agent or through a remote module.
    • mode: Running agent mode, 0 normal, 1 training.
    • interval: Agent execution interval. Depending on this interval, the agent will be showed as out of limits.
    • id_os: Agent SO identifier (ref. tconfig_os).
    • os_version: SO version (free text).
    • agent_version: Agent version (free text). Updated by software agents.
    • last_remote_contact: Last agent-received contact date. In case of software agents, and unlike last_contact, the date is sent by the agent itself.
    • disabled: Agent status, enabled (0) or disabled (1).
    • id_parent: Identifier of the agent parent (ref. tagent).
    • custom_id: Agent custom identifier. Useful to interact with other tools.
    • server_name: Name of the server the agent is assigned to.
    • cascade_protection: Cascade protection. Disabled at 0. When at 1, it prevents agent-associated alerts from being triggered if a critical agent parent alert was triggered. For more info, check the section about Alerts.
  • tagent_data: Data received from each module. If for the same module the last received data is the same as the previous one, it will be not added (but tagent_status is updated). The incremental and string type data are saved in different tables.
  • tagent_data_inc: Incremental data type.
  • tagent_data_string: String data type.
  • tagent_status: Information of the current status of each module.
    • id_agent_status: Identifier.
    • id_agent_module: Module identifier.(ref. tagent_module).
    • data: Value of the last received data.
    • timestamp: Data of the last data received (it could come from the agent).
    • status: Module status: 0 NORMAL, 1 CRITICAL, 2 WARNING, 3 UNKNOWN.
    • id_agent: Agent identifier associated to the module (ref. tagent).
    • last_try: Date of the module's last successful execution.
    • utimestamp: Date of the module's last execution in UNIX format.
    • current_interval: Module execution intervale in seconds.
    • running_by: Name of the server that executed the module.
    • last_execution_try: Date of the last module execution try. The execution could have failed.
    • status_changes: Number of status changes. It is used to avoid continuous status changes. For more info, check out the Operation section.
    • last_status: Previous module status.
  • tagent_module: Module configuration.
    • id_agent_module: Module unique identifier.
    • id_agente: Agent identifier associated to the module (ref. tagent).
    • id_tipe_module: Type of module (ref. ttipo_modulo).
    • description: Free text.
    • name: Module name.
    • max: Module maximum value. Data higher than this value will not be valid.
    • min: Module minimum value. Data lower than this value will not be valid.
    • module_interval: Module execution interval in seconds.
    • tcp_port: Destination TCP port in network modules and plugins. Name of the column to read in WMI modules.
    • tcp_send: Data to send in network modules. Namespace in WMI modules.
    • tcp_rcv: Expected answer in network modules.
    • snmp_community: SNMP community in network modules. Filter in WMI modules.
    • snmp_oid: OID in network modules. WQL Query in WMI modules.
    • ip_target: Destination address in network modules, plugin and WMI.
    • id_module_group: Identifier of the group the module belongs to (ref. tmodule_group).
    • flag: Forced execution flag. If is at 1, the module will be executed although it has no right by interval.
    • id_modulo: Identifier for modules that could not been recognized by its id_module_type. 6 for WMI modules, 7 for WEB modules.
    • disabled: Module status, 0 enabled, 1 disabled.
    • id_export: Identifier of the export server associated to the module (ref. tserver).
    • plugin_user: Username in plugin and WMI modules, user-agent in Web modules.
    • plugin_pass: Password in plugin modules and WMI, number of retries in Web modules.
    • plugin_parameter: Additional parameters in plugin modules, configuration of Goliat task in Web modules.
    • id_plugin: Identifier of the plugin associated to the module in plugin modules (ref. tplugin).
    • post_process: Value the module data will be multiplied by before being saved.
    • prediction_module: 1 if it is a prediction module, 2 if it is a service module, 3 if it is synthetic and 0 in any other case.
    • max_timeout: Waiting time in seconds for plugin modules.
    • custom_id: Module customized identifier. Useful to interact with other tools.
    • history_data: If it is set at 0, module data will not be saved at tagent_data*, only tagent_status will be updated.
    • min_warning: Minimum value that activates the WARNING status.
    • max_warning: Maximum value that activates the WARNING status.
    • min_critical: Minimum value that activates the CRITICAL status.
    • max_critical: Maximum value that activates the CRITICAL status.
    • min_ff_event: Number of times that should a status change term must be met before this change take place. It is is related to tagent_status.status_changes.
    • delete_pending: If it is set at 1, it will be deleted by the maintenance script of pandora_db.pl database.
    • custom_integer_1: When prediction_module equals 1, this field is the module id from where data for predictions are obtained. When prediction_module equals 2, this field is the service id assigned to the module.
    • custom_integer_2:
    • custom_string_1:
    • custom_string_2:
    • custom_string_3:
  • tagent_access: A new entry will be added each time data are received from an agent to any server, but never more than one by minute to avoid overloading the database. It can be disabled by setting agentaccess to 0 in the pandora_server.conf configuration file.
  • talert_snmp: SNMP alert configuration.
  • talert_commands: Commands that can be executed from actions associated to an alert (e.g. send mail).
  • talert_actions: Command instance associated to any alert (e.g. send mail to administrator).
  • talert_templates: Alert templates.
    • id: Template unique identifier.
    • name: Template name.
    • description: Description.
    • id_alert_action: Identifier of the default action linked to the template.
    • field1: Customized field 1(free text).
    • field2: Customized field 2(free text).
    • field3: Customized field 3 (free text).
    • type: Type of alert according to the triggering term ('regex', 'max_min', 'max', 'min', 'equal', 'not_equal', 'warning', 'critical').
    • value: Value for regex type alerts (free text).
    • matches_value: When set to 1, it inverts the logic of the triggering term.
    • max_value: Maximum value for max_min and max alerts.
    • min_value: Minimum value for max_min and min alerts.
    • time_threshold: Alert interval.
    • max_alerts: Maximum number of times an alert will be triggered during an interval.
    • min_alerts: Minimum number of times that the triggering term must be met during an interval for the alert to be triggered.
    • time_from: Time from which the alert will be active.
    • time_to: Time until which the alert will be active.
    • monday: When set to 1, the alert is active on Mondays.
    • tuesday: When set to 1, the alert will be active on Tuesdays.
    • wednesday: When set to 1, the alert will be active on Wednesdays.
    • thursday: When set to 1, the alert will be active on Thursdays.
    • friday: When set to 1, the alert will be active on Fridays.
    • saturday: When set to 1, the alert will be active on Saturdays.
    • sunday: When set to 1, the alert will be active on Sundays.
    • recovery_notify: When set to 1, it enables alert recovery.
    • field2_recovery: Custom field 2 for alert recovery (free text).
    • field3_recovery: Custom field 3 for alert recovery (free text).
    • priority: Alert priority: 0 Maintenance, 1 Informational, 2 Normal, 3 Warning, 4 Critical.
    • id_group: Identifier of the group the template belongs to (ref. tgrupo).
  • talert_template_modules: Instance of an alert template associated to a module.
    • id: Alert unique identifier.
    • id_agent_module: Identifier of the module linked to the alert (ref. tagente_modulo).
    • id_alert_template: Identifier of the template associated to the alert (ref. talert_templates).
    • internal_counter: Number of times that the alert triggering term was met.
    • last_fired: Last time the alert was triggered (Unix time)
    • last_reference: Start of the current interval (Unix time).
    • times_fired: Number of times the alert was triggered (it could be different from internal_counter)
    • disabled: When set to 1, the alert is disabled.
    • priority: Alert priority : 0 Maintenance, 1 Informational, 2 Normal, 3 Warning, 4 Critical.
    • force_execution: When set to 1, the action of the alert will be executed even thought it has not been triggered. It is used for alert manual execution.
  • talert_template_module_actions: Instance of an action associated to an alert (ref. talert_template_modules).
  • talert_compound: Compound alerts, the columns are similar to those of talert_templates.
  • talert_compound_elements: Simple alerts associated to a compound alert, each one with its correspondent logic operation (ref. talert_template_modules).
  • talert_compound_actions: Actions associated to a compound alert (ref. talert_compound).
  • tattachment: Attachments associated to an incidence.
  • tconfig: Console configuration.
  • tconfig_os: Valid Operative systems in Pandora FMS.
  • tevento: Event entries. Priority values are the same as those of alerts.
  • tgrupo: Defined groups in Pandora FMS.
  • tincidencia: Incidence entries.
  • tlanguage: Available languages in Pandora FMS.
  • tlink: Links showed at the console menu lower side.
  • tnetwork_component: Network components. They are modules associated to a network profile used by the Recon Server. After they result in an entry at tagent_module, so the columns of both tables are similar.
  • tnetwork_component_group: Groups to classify network components.
  • tnetwork_profile: Network profile. Network component group that will be assigned to recognition tasks of the Recon Server. The network components associated to the profile will result in modules in the created agents.
  • tnetwork_profile_component: Network components associated to a Network profile (rel. tnetwork_component/tnetwork_profile).
  • tnota: Notes associated to an incidence.
  • torigen: Possible source of an incidence.
  • tperfil: User profiles defined in the console.
  • tserver: Registered servers.
  • tsesion: Information on actions that took place during an user session for administration and statistical logs.
  • ttype_module: Module types depending on their source and kind of data.
  • ttrap: SNMP traps received by the SNMP console.
  • tuser: Console-registered users.
  • tuser_profile: User-associated profiles (rel. tuser/tprofile).
  • tnews: News showed on the console.
  • tgraph: Custom graphs created in the console.
  • tgraph_source: Modules associated to a graph (rel. tgraph/tagente_modulo).
  • treport: Custom reports created in the console.
  • treport_content: Elements associated to a certain report.
  • treport_content_sla_combined: Components of an SLA element associated to a certain report.
  • tlayout: Custom maps created in the console.
  • tlayout_data: Elements associated to a map.
  • tplugin: Plugin definitions for the Plugin Server.
  • tmodule: Module types (Network, Plugin, WMI...).
  • tserver_export_data: Data to export, associated to a destination.
  • tplanned_downtime: Planned downtimes.
  • tplanned_downtime_agents: Agents associated to a planned downtime (rel. tplanned_downtime/tagent).

1.1.3 Real time data compression

To avoid overloading the database, the server performs a simple compression at the time of insertion. One piece of data will not be stored in the database, unless it is be different from the previous one or there is a time difference of 24 hours between them.

For example, for an interval of around 1 hour, then the sequence 0,1,0,0,0,0,0,0,1,1,0,0 is saved in the database as 0,1,0,1,0. A consecutive 0 will not be saved unless 24h have passed.

The graph shown here has been drawn from the data of the previous example. Only the data in red have been inserted in the database.

Data compression 01.png

Compression affects data processing algorithms, both to metrics and graphs. And it is important to keep in mind that the blanks caused by the compression must be filled in.

Considering all of this, in order to calculate based on the data of a module, with a certain interval and the starting data, you should follow these steps:

  • Search for the previous data, out of the interval and date given. If it exists, put it at the beginning of the range. If it does not exist, there is no data.
  • Look for the following data out of the range and data given until a maximum equal to the module interval. If it exists, then put it at the end of the interval. If not, extend the last available value until the end of the interval.
  • All data should be checked, considering that a piece of data is valid until you receive a different piece data.

1.1.4 Data compression

Pandora FMS now includes a system to "compress" database information. This system is focused on small / medium-sized deployments (250-500 agents, < 100,000 modules) which must have a wide information history but "loosing" some resolution.

Pandora FMS database maintenance, which is executed each hour, compacts old data among other cleaning tasks. Compression is done using a simple linear interpolation, which means that if there are 10,000 data in a single day, the process will reduce those 10,000 points to 1000 points.

Since it is an interpolation, some details are missing in that information, but it is enough for monthly, yearly, etc., reports and graphs.

In big databases, this could be "expensive" in terms of database performance, and it should be disabled. You may use the history database model instead.

1.1.5 History database

This is an Enterprise feature, and it is used to store old information that is not used in daily views, for example, one-month-old data. These data are automatically migrated to a different database that must be in a different physical server (not virtual) with a hard drive different to that of the main database.

When a graph or report containing old data is shown, Pandora FMS will look for the first days in the main database, and when reaching the point when data are migrated to the history database, it will search there. Thanks to that, performance is optimized even when storing a high amount of information within the system.

To configure this, install the history database manually in another server (importing the Pandora FMS schema, without data), and permissions to allow access to it from the main Pandora FMS server.

Go to Setup > History database and configure there the settings to access the history database.



Bbddhist.png



Here are some interesting settings explained:

  • Days: Max amount of days the information is stored in the main database. After that date, data will be moved to the history database. 30 days is a good option.
  • Rate: This works like a buffer. The database maintenance script, will take X registers from database, add them to the history database and will delete them from the main database. This is time consuming, and size depends on your console setup. 1000 is a good default value.
  • Delay: After a block of a series of modules, the script will wait for the amount of time specified in delay(seconds). It is useful if your database performance is poor, to avoid overloads. Use values between 1-5.

The default Pandora FMS configuration does NOT transfer string data to the historical database, however, if this configuration has been modified and the historical database is receiving this type of information, it is essential to configure its purging, since otherwise it will end up taking too much time, causing big problems, besides having a negative impact on performance.

To configure this parameter, must run a query directly in the database to set the days after which this information will be purged. The key table here is tconfig and the field is string_purge. If you wanted to set 30 days for the purging of this type of information, this query should be run directly on the historical database:

UPDATE tconfig SET value = 30 WHERE token = "string_purge";

A good way to test whether database maintenance is correctly executed is running the script manually:

/usr/share/pandora_server/util/pandora_db.pl /etc/pandora/pandora_server.conf

It should not report any error.

1.2 Pandora FMS module status

In Pandora FMS, modules can have different status: Unknown, Normal, Warning, Critical or with Triggered Alerts.

1.2.1 When is each status set?

Each module has Warning and Critical thresholds set in its configuration. These thresholds define its data values for which these status will be activated. If the module gives data out of these thresholds, then it will be considered to be in Normal status.

Each module also has a time interval that will set the frequency with which it will get data. This interval will be taken into account by the console to collect data. If the module has not collected data for twice its interval, this module will be considered to be in Unknown status.

Finally, if the module has alerts configured and any of them has been triggered but not validated, then the module will have the corresponding Triggered Alert status.

1.2.2 Spreading and priority

Within Pandora FMS organization, some elements depend on others, as for example agent modules or group modules. These equally applies to Pandora's FMS Enterprise policies, which have certain agents and modules associated that are considered to be associated to each agent.

This structure is specially useful for evaluating module status at a glance. This is achieved by spreading upwards in the organization schema the status, granting that status to agents, groups and policies.


1.2.2.1 Which status does the agent have?

An agent will have the worst of its modules's status. At the same time, a group will have the worst of the its agent's status, and the same for policies, which will have the worst status of its assigned agents.

That way, by seeing a group with a critical status, for example, you will known that at least one of its agents has the same status. To locate it, go down to the next level, to that of the agents to narrow down the module or modules that caused the spreading of the critical status.


1.2.2.2 Which should be the status priority?

When it is said the worst status is spread, it must be clear which status are the most important ones. Therefore, there is a priority list. In there the first status has higher priority over the others and the last one the one that has the lowest. This one will be shown only when all elements have it.

  1. Triggered Alerts
  2. Critical status
  3. Warning status
  4. Unknown status
  5. Normal status


You may see that when a module has triggered alerts, its status has priority over the rest, and the agent and group it belongs to will have this status.

On the other hand, in order for one group to be in normal status, all its agents must have that status; which implies that all the group modules will be in normal status.

1.2.3 Color Code

Each one of the status mentioned has a color assigned, in order to to easily see in the network maps when something does not work properly.

Orange status.png Fired alerts status
Red status.png Critical status
Yellow status.png Warning status
Grey status.png Unknown status
Green status.png Normal status

1.3 Pandora FMS graphs

Graphs are one of the most complex Pandora FMS implementations, because they gather information in real-time from the DB, and no external system is used (rrdtool or similar).

There are several graph performances according to the type of source data:

  • Asynchronous modules. It is assumed that there is no data compaction. Data stored in the DB are all the real samples of the data (therefore, no compaction). It creates more "accurate" graphs without possible misinterpretation.
  • Text string modules. They show the rate of the collected data.
  • Numerical modules. Most modules report such data.
  • Boolean modules. This are numerical data on *PROC modules: for instance, ping checks, interface status, etc. 0 means wrong, 1 means "Normal".

1.3.1 Compression

Compression affects how graphics are represented. When two pieces of data with the same value are received, Pandora FMS does not store the last on. If when we are representing a graph, there is no reference value, Pandora FMS will searches until 48 hours back in time to find the last known value to take as reference. If it does not find anything, it will start from 0.

In asynchronous modules, although there is no compression, the backwards search algorithm behaves similarly.

1.3.2 Interpolation

When making a graph, Pandora FMS takes 50xN samples (N is the graph resolution factor, which can be configured in the setup. It is 3 by default). For example, a monitor that returns data every 300 seconds (5 minutes) will generate 12 samples per hour, and 288 samples (12*24) a day. So a day graph would not represent 288 values, but rather they are "compresses" into only 150 (50*3) samples.

This means some resolution is lost. And the more samples you have, the more you will lose. But this can be avoided by creating the graphic with a different interval or zoom.

Graph-explain.png

In normal graphs, interpolation is implemented in a simple way: if there are two samples withing an interval (e.g.: interval B of the example), the average will be calculated and representes.

In boolean graphs, if there are several data within a sample (in this case only 1 or 0), 0 will be shown. This helps to visualize failures within an interval, giving it priority over the normal status.

In both cases, if there is no data within a sample (because it is compressed or because it is missing), the last known value of the previous interval will be used, like interval E of the example above.

1.3.3 Avg/Max/Min

Grafica avg max min.png

The graphs show the average, maximum and minimum values by default. Because a sample can have several data, average, maximum or minimum data values will appear. The more interpolation needed (the longer the display period is and the more data you have), the higher the interpolation level will be and therefore, the difference between maximum and minimum values will be higher. If the graph range is low (an hour or so), there will not be any interpolation, or it will be minimum, so you will see the data with its real resolution, and the three series will be identical.


Go back to documentation index