Difference between revisions of "Pandora: Documentation en: Intro Monitoring"

From Pandora FMS Wiki
Jump to: navigation, search
(FF Threshold)
(Monitoring with Pandora FMS)
Line 5: Line 5:
 
== Monitoring with Pandora FMS ==
 
== Monitoring with Pandora FMS ==
  
All the user interaction with Pandora FMS is done through the WEB console. The Pandora FMS console is a WEB console which follows the latest standards and WEB technologies, so it requires an advanced browser and the optional use of Flash. It is recommended to use Firefox 2.x or higher.
+
All user interaction with Pandora FMS is done through the WEB console. The Pandora FMS console is a WEB console which follows the latest standards and WEB technologies It requires an advanced browser and the optional use of Flash. It is recommended to use Firefox 2.x or higher.
You can also use Internet Explorer 8 or higher, although it gives an uncomfortable user experience due to its peculiar way to manage some WEB controls.
+
You can also use Internet Explorer 8 or higher, although it gives an uncomfortable user experience due to its peculiar way of managing some WEB controls.
  
Generally spoken monitoring consists of the execution of processes (through modules) in any system in order to send its resulting data to a server. The server will process these resulting data where the front-end (WEB console) is going to display it to the user.
+
Generally speaking, monitoring consists of the execution of processes (through modules) in any system in order to send the resulting data to a server. The server processes the resulting data where the front-end (WEB console) is going to display it to the user.
  
 
Pandora FMS is a scalable monitoring tool. It would be possible to monitor about 1200 to 1500 agents with a single server, although the number of monitoring processes could grow without restrictions with the correct architecture (Meta Console).
 
Pandora FMS is a scalable monitoring tool. It would be possible to monitor about 1200 to 1500 agents with a single server, although the number of monitoring processes could grow without restrictions with the correct architecture (Meta Console).
Line 16: Line 16:
 
There are two main monitoring procedures with Pandora FMS: The software agent based (local) and the remote one.
 
There are two main monitoring procedures with Pandora FMS: The software agent based (local) and the remote one.
  
The software agent based monitoring includes a piece of software (module) into the monitored system, e.g. the measurement of the percentage of CPU usage on a certain system while the remote monitoring is done through network tests without the use of modules, e.g. checking if a certain host is active or not.
+
The software agent based monitoring includes a piece of software (module) in the monitored system, e.g. the measurement of the percentage of CPU usage on a certain system while the remote monitoring is done through network tests without the use of modules, e.g. checking if a certain host is active or not.
  
The main difference between these two types is that whereas the software agent based is executed from the monitored system, the remote monitoring is executed from the Pandora FMS Server against the target system.
+
The main difference between these two types is that whereas the software agents are executed from the monitored system, the remote monitoring is executed from the Pandora FMS Server against the target system.
  
 
=== Agents on Pandora FMS ===
 
=== Agents on Pandora FMS ===
Line 24: Line 24:
 
All monitoring done by Pandora FMS is managed through a generic entity called 'Agent' which is incorporated into a more generic block called 'Group'. An agent can only belong to one group.
 
All monitoring done by Pandora FMS is managed through a generic entity called 'Agent' which is incorporated into a more generic block called 'Group'. An agent can only belong to one group.
  
The information is logically arranged by means of a hierarchy which is based on groups, agents, module groups and modules. There are Agents which are solely based on the information given by a software agent installed on the system, and Agents with exclusive network information - information that doesn't come from a software agent where installing software is not necessary which would execute the network monitoring tasks from the Pandora FMS Network Servers.
+
Information is logically arranged by means of a hierarchy which is based on groups, agents, module groups and modules. There are Agents which are solely based on the information given by a software agent installed on the system, and Agents with exclusive network information - information that doesn't come from a software agent where installing software is not necessary which would execute the network monitoring tasks from the Pandora FMS Network Servers.
  
 
<center><br><br>
 
<center><br><br>
Line 36: Line 36:
 
=== Status and Event Monitoring ===
 
=== Status and Event Monitoring ===
  
With Pandora FMS 3.0, a new important functionality was added. The way in which Pandora FMS has been working until now, was changed. Pandora FMS allows the user to fix standards to define any data in three possible states:
+
With Pandora FMS 3.0, a new important functionality was added; allowing the user to fix standards to define any data in three possible states:
  
 
'NORMAL', 'WARNING' and 'CRITICAL'.
 
'NORMAL', 'WARNING' and 'CRITICAL'.
  
Automatically, all modules of the 'proc' kind are defined as 'NORMAL' if they have a value of '1' or bigger than '1'. They will be defined as 'CRITICAL' if they have a value lower than '1' ('0' or a negative value).
+
Automatically, all modules of the 'proc' kind are defined as 'NORMAL' if they have a value of '1' or higher. They will be defined as 'CRITICAL' if they have a value lower than '1' ('0' or a negative value).
  
But what happens with a value of CPU usage? How could the system know if it's a 'NORMAL', 'CRITICAL' or 'WARNING' value? It doesn't know it by default - it only gets a numeric value and if nothing has been defined for it, all the values would be 'right' in 'NORMAL' status.
+
But what happens with a value of CPU usage? How could the system know if it's a 'NORMAL', 'CRITICAL' or 'WARNING' value? It doesn't know by default - it only gets a numeric value and if nothing has been defined for it, all the values would be 'right' in 'NORMAL' status.
  
 
There are two status fields in the agent configuration which haven't been mentioned before. These are:
 
There are two status fields in the agent configuration which haven't been mentioned before. These are:
Line 49: Line 49:
 
* '''''Critical status'''''
 
* '''''Critical status'''''
  
Each of those two fields can possess two values: Minimum and Maximum. By configuring them correctly, you're going to realize that some values will show a module in 'warning' and others in a 'critical' status:
+
Each of those two fields can possess two values: Minimum and Maximum. By configuring them correctly, some values will show a module in 'warning' and others in a 'critical' status:
  
 
<center><br><br>
 
<center><br><br>
Line 62: Line 62:
 
If you're going to reach the 90% value with these settings, the module will be shown in red ('CRITICAL'), if it's between 70% and 89.99%, it will be yellow ('WARNING') and under 70% in green ('NORMAL').
 
If you're going to reach the 90% value with these settings, the module will be shown in red ('CRITICAL'), if it's between 70% and 89.99%, it will be yellow ('WARNING') and under 70% in green ('NORMAL').
  
If we have a module with a string type, you're able to configure the status using a regular expression in the ''Str'' fields of 'Warning' and 'Critical' status parameters. If we have e.g. a module that returns ''OK'', ''ERROR: Connection fail'' or ''BUSY: Too much devices'' it depends on the query result.
+
If we have a module with a string type, you're able to configure the status using a regular expression in the ''Str'' fields of 'Warning' and 'Critical' status parameters. If we have e.g. a module that returns ''OK'', ''ERROR: Connection fail'' or ''BUSY: Too many devices'' it depends on the query result.
  
 
To configure the 'WARNING' and 'CRITICAL' module status, we will use the following regular expressions:
 
To configure the 'WARNING' and 'CRITICAL' module status, we will use the following regular expressions:
Line 69: Line 69:
 
  Critical Status: .*ERROR.*
 
  Critical Status: .*ERROR.*
  
'''You have to be careful here, because this regular expressions are case sensitive'''. With this module configuration, the status will be 'WARNING' if the data contains the string ''BUSY'' and it's going to jump to 'CRITICAL' if the data string contains ''ERROR''.
+
'''You have to be careful here, because these regular expressions are case sensitive'''. With this module configuration, the status will be 'WARNING' if the data contains the string ''BUSY'' and it's going to jump to 'CRITICAL' if the data string contains ''ERROR''.
  
If, by any chance, '''both states are configured with the same values, the 'Critical' value will always have precedence'''. In this case, the 'Warning' state is unreachable, because the 'Critical' state is more important.
+
If, by any chance, '''both states are configured with the same values, the 'Critical' value will always have precedence'''. In this case, 'Warning' status is unreachable, because 'Critical' status is more important.
  
 
This is an example of the modules in each of the states:
 
This is an example of the modules in each of the states:
Line 81: Line 81:
 
It's obvious these fields have no sense for modules which only return boolean values ('1' or '0').
 
It's obvious these fields have no sense for modules which only return boolean values ('1' or '0').
  
These values are shown in the main screen of the monitor view. You're instantly able to tell by taking a quick look on how many checks are in the 'Normal', 'Warning' or 'Critical' states.
+
These values are shown in the main screen of the monitor view. You're instantly able to tell by taking a quick look how many checks are in the 'Normal', 'Warning' or 'Critical' states.
  
 
=== Other Common Monitoring Parameters===
 
=== Other Common Monitoring Parameters===
Line 91: Line 91:
 
</center><br><br>
 
</center><br><br>
  
Pandora FMS optionally allows to keep the history of any data individually. All modules keep a history (so they're able to generate graphs and include them in reports of the historical / evolutive kind) by default. In a very big implantation which requires to monitor a lot of data, it's possible that you have no need to keep the history for some, thereby allowing for the possibility of occupying a lot less resources in this way.
+
Pandora FMS optionally allows any individual data set to be saved. All modules keep a history (so they're able to generate graphs and include them in reports of the historical / evolutive kind) by default. In a very big implantation which requires a lot of data to be monitored, it's possible that you have no need to keep the history for some, thereby allowing for the possibility of occupying less resources.
  
This option allows to deactivate the history of the modules where you don't need to keep a history. Even if you deactivate the history, the alerts will continue to work in exactly the same way e.g. as the event generation and the view of the current state of this monitor.
+
This option allows the history of the modules where you don't need to keep a history to be deactivated. Even if you deactivate the history, the alerts will continue to work in exactly the same way e.g. as event generation and the view of the current state of this monitor.
  
 
==== FF Threshold ====
 
==== FF Threshold ====
Line 101: Line 101:
 
</center><br><br>
 
</center><br><br>
  
The FF Threshold Parameter (FF=FlipFlop) is used to 'filter' the continuous changes of the state in the creation of events / statuses. In Pandora FMS, you can indicate that, until an element hasn't adapted the same status at least X times after having changed from an original status, it won't get considered as changed. Lets see a classical example: One ping to a host where there is loss of packages. In an evironment like this, it's possible to receive results as these:
+
The FF Threshold Parameter (FF=FlipFlop) is used to 'filter' the continuous changes of the state in the creation of events / statuses. In Pandora FMS, you can indicate that, until an element has adapted the same status at least X times after having changed from an original status, it won't be considered as changed. Lets see an example: One ping to a host where there is loss of packages. In an environment like this, it's possible to receive the following results:
  
  
Line 114: Line 114:
 
  1
 
  1
  
However, the host is alive in all cases. What we really want to say to Pandora is: Until the host doesn't say that it's at least three times down, it doesn't show it as down, so in the previous case it would never be as down, and it would only be this way in this case:
+
However, the host is alive in all cases. What we really want to say to Pandora is: Until the host doesn't say that it's at least three times down, it doesn't show it as down, so in the previous case it would never be shown as down, and it would only be this way in this case:
  
 
  1
 
  1
Line 131: Line 131:
  
 
* '''All state changing''': same value is used for all state changing, to normal, warning and critical.
 
* '''All state changing''': same value is used for all state changing, to normal, warning and critical.
* '''Each state changing''': different value can be set for each state changing, to normal, warning and critical.
+
* '''Each state changing''': different values can be set for each change of status, to normal, warning and critical.
  
 
In async modules, the timeout (FF timeout) can also be set. It's useful if you want to fire an alert only when the data server received several critical/warning data in a short period of time.
 
In async modules, the timeout (FF timeout) can also be set. It's useful if you want to fire an alert only when the data server received several critical/warning data in a short period of time.
Line 140: Line 140:
 
</center>
 
</center>
  
For example, if you want to fire an alert only when agent sends critical data twice in 5 minutes (you don't want to fire an alert when data arrival interval exceeds 5 minutes.),
+
For example, if you want to fire an alert only when an agent sends critical data twice in 5 minutes (you don't want to fire an alert when data arrival interval exceeds 5 minutes.),
 
set the FF threshold to 1 and the FF timeout to 300.
 
set the FF threshold to 1 and the FF timeout to 300.
  

Revision as of 13:57, 10 April 2017

Go back to Pandora FMS documentation index

1 Introduction to Monitoring

1.1 Monitoring with Pandora FMS

All user interaction with Pandora FMS is done through the WEB console. The Pandora FMS console is a WEB console which follows the latest standards and WEB technologies It requires an advanced browser and the optional use of Flash. It is recommended to use Firefox 2.x or higher. You can also use Internet Explorer 8 or higher, although it gives an uncomfortable user experience due to its peculiar way of managing some WEB controls.

Generally speaking, monitoring consists of the execution of processes (through modules) in any system in order to send the resulting data to a server. The server processes the resulting data where the front-end (WEB console) is going to display it to the user.

Pandora FMS is a scalable monitoring tool. It would be possible to monitor about 1200 to 1500 agents with a single server, although the number of monitoring processes could grow without restrictions with the correct architecture (Meta Console).

1.1.1 Monitoring by Software Agent vs. Remote Monitoring

There are two main monitoring procedures with Pandora FMS: The software agent based (local) and the remote one.

The software agent based monitoring includes a piece of software (module) in the monitored system, e.g. the measurement of the percentage of CPU usage on a certain system while the remote monitoring is done through network tests without the use of modules, e.g. checking if a certain host is active or not.

The main difference between these two types is that whereas the software agents are executed from the monitored system, the remote monitoring is executed from the Pandora FMS Server against the target system.

1.1.2 Agents on Pandora FMS

All monitoring done by Pandora FMS is managed through a generic entity called 'Agent' which is incorporated into a more generic block called 'Group'. An agent can only belong to one group.

Information is logically arranged by means of a hierarchy which is based on groups, agents, module groups and modules. There are Agents which are solely based on the information given by a software agent installed on the system, and Agents with exclusive network information - information that doesn't come from a software agent where installing software is not necessary which would execute the network monitoring tasks from the Pandora FMS Network Servers.



AgentHierarchy.png


There are also agents which have network information -and- information obtained through software agents.

The information is collected in modules which are logically assigned to Pandora FMS agents in the console. It's important to distinguish the concept of Agents (where the modules which contain the collected info are located) from Software Agents which are getting executed on remote systems.

1.1.3 Status and Event Monitoring

With Pandora FMS 3.0, a new important functionality was added; allowing the user to fix standards to define any data in three possible states:

'NORMAL', 'WARNING' and 'CRITICAL'.

Automatically, all modules of the 'proc' kind are defined as 'NORMAL' if they have a value of '1' or higher. They will be defined as 'CRITICAL' if they have a value lower than '1' ('0' or a negative value).

But what happens with a value of CPU usage? How could the system know if it's a 'NORMAL', 'CRITICAL' or 'WARNING' value? It doesn't know by default - it only gets a numeric value and if nothing has been defined for it, all the values would be 'right' in 'NORMAL' status.

There are two status fields in the agent configuration which haven't been mentioned before. These are:

  • Warning status
  • Critical status

Each of those two fields can possess two values: Minimum and Maximum. By configuring them correctly, some values will show a module in 'warning' and others in a 'critical' status:



Critico.jpg



To understand these options better, it's best to see an example. The CPU module will always be on 'green' in the agent status, so it simply informs about a value between 0% and 100%. If we want the module of the CPU usage to be shown in yellow ('warning') if it has reached e.g. 70% of its use, and in red ('critical') if it e.g. reached 90%, it's recommended to configure:

  • Warning status:70
  • Critical status:90

If you're going to reach the 90% value with these settings, the module will be shown in red ('CRITICAL'), if it's between 70% and 89.99%, it will be yellow ('WARNING') and under 70% in green ('NORMAL').

If we have a module with a string type, you're able to configure the status using a regular expression in the Str fields of 'Warning' and 'Critical' status parameters. If we have e.g. a module that returns OK, ERROR: Connection fail or BUSY: Too many devices it depends on the query result.

To configure the 'WARNING' and 'CRITICAL' module status, we will use the following regular expressions:

Warning Status: .*BUSY.*
Critical Status: .*ERROR.*

You have to be careful here, because these regular expressions are case sensitive. With this module configuration, the status will be 'WARNING' if the data contains the string BUSY and it's going to jump to 'CRITICAL' if the data string contains ERROR.

If, by any chance, both states are configured with the same values, the 'Critical' value will always have precedence. In this case, 'Warning' status is unreachable, because 'Critical' status is more important.

This is an example of the modules in each of the states:



Colorin.jpg


It's obvious these fields have no sense for modules which only return boolean values ('1' or '0').

These values are shown in the main screen of the monitor view. You're instantly able to tell by taking a quick look how many checks are in the 'Normal', 'Warning' or 'Critical' states.

1.1.4 Other Common Monitoring Parameters

1.1.4.1 Historical Data



Historicaldata.png



Pandora FMS optionally allows any individual data set to be saved. All modules keep a history (so they're able to generate graphs and include them in reports of the historical / evolutive kind) by default. In a very big implantation which requires a lot of data to be monitored, it's possible that you have no need to keep the history for some, thereby allowing for the possibility of occupying less resources.

This option allows the history of the modules where you don't need to keep a history to be deactivated. Even if you deactivate the history, the alerts will continue to work in exactly the same way e.g. as event generation and the view of the current state of this monitor.

1.1.4.2 FF Threshold



Fft.png



The FF Threshold Parameter (FF=FlipFlop) is used to 'filter' the continuous changes of the state in the creation of events / statuses. In Pandora FMS, you can indicate that, until an element has adapted the same status at least X times after having changed from an original status, it won't be considered as changed. Lets see an example: One ping to a host where there is loss of packages. In an environment like this, it's possible to receive the following results:


1
1
0
1
1
0
1
1
1

However, the host is alive in all cases. What we really want to say to Pandora is: Until the host doesn't say that it's at least three times down, it doesn't show it as down, so in the previous case it would never be shown as down, and it would only be this way in this case:

1
1
0
1
0
0
0

From this point it will be shown as down - but not before that.

So the 'Flip_Flop' protections are pretty useful to avoid disturbing fluctuations. All modules implement it. Its use is to avoid the change of status (limited by the defined or automatic limits, as shown in the case of 'proc' modules).

From 5.1 version, the FF threshold has two modes.

  • All state changing: same value is used for all state changing, to normal, warning and critical.
  • Each state changing: different values can be set for each change of status, to normal, warning and critical.

In async modules, the timeout (FF timeout) can also be set. It's useful if you want to fire an alert only when the data server received several critical/warning data in a short period of time. When data arrival interval exceeded the timeout, the counter of FF threshold is reset.

Ff timeout.png

For example, if you want to fire an alert only when an agent sends critical data twice in 5 minutes (you don't want to fire an alert when data arrival interval exceeds 5 minutes.), set the FF threshold to 1 and the FF timeout to 300.

Go back to Pandora FMS documentation index