Difference between revisions of "Pandora: Documentation en: Intro Monitoring"

From Pandora FMS Wiki
Jump to: navigation, search
(No difference)

Revision as of 10:47, 18 June 2012

Go back to Pandora FMS documentation index

1 Introduction to monitoring

1.1 Monitoring with Pandora FMS

All the user interaction with Pandora FMS is done through the WEB console. The Pandora FMS console is a WEB console that follows the last standards and WEB technologies, so it requires an advanced browser and the optional use of Flash. It is recommended to use Firefox 2.x or higher. You can also use Internet Explorer 8 or higher, although it gives an uncomfortable user experience, due to its peculiar way to manage some WEB controls.

In a generic way, it is possible to say that monitoring consists in the execution of processes (through modules) in any system in order to send its resultant data to the server. The server will process these resultant data the front-end (WEB console) will display them to the user.

Pandora FMS is an scalable monitoring tool. It would be possible monitoring around 1200/1500 agents with an unique server, although with the correct architecture (Meta Console), the monitoring process number could grow without restrictions.

1.1.1 Monitoring with software agent vs. Remote monitoring

There are two main monitoring procedures with Pandora FMS, the software agent based (local) and the remote one.

The software agent based includes a piece of software (module) into the monitored system, e.g. the measurement of the percent of CPU use in a certain system. While the remote monitoring is done through network tests without use of modules, e.g. checking if a certain host is active.

The main difference between these two types is that whereas the software agent based is executed from the from the monitored system, the remote monitoring is executed from the Pandora FMS server against the target system.

1.1.2 Agents at Pandora FMS

All the monitoring that is done by Pandora FMS is managed through a generic entity called "agent", that is into a more generic block, called group. An agent can only belong to one group.

The information is arranged in a logical way throug a hierarchy based on groups, agents, module groups and modules. There are agents based only in the information given by a software agent and installed in the System, and agents with exclusive network information, information that does not come from a software agent, where installing software is not necessary , and that executes the network monitoring tasks from Pandora FMS network servers.



AgentHierarchy.png


In the same way, there are agents that have network information and also information obtained through software agents.

The information is collected in modules that are assigned (in a logical way) to Pandora FMS agents in the console. It is important to distinguish between the concept of agent (where the modules that contain the collected info hang ) from the software agents that are executed in remote systems.

1.1.3 Status/Event monitoring

With Pandora FMS 3.0 a new important functionality is added. It changes the way in which Pandora FMS has been working until now. Pandora FMS allows that the user fix standars to define any data in three possible status: NORMAL, WARNING y CRITICAL.

In an automatic way, all modules kind *proc are kept as NORMAL if they have a value of 1 or bigger than 1, and as CRITICAL if they have a value lower than 1 (0 or a negative value).

But, what happens with a value of CPU usage? How could the system know if it is a NORMAL, CRITICAL or WARNING value?. It does not know it by default, it only gets a numeric value and if nothing has been said, for it all the values would be "right", this is, in NORMAL status.

There are two status fields in the agent configuration that have not mentioned before.These are the fields:

  • Warning status
  • Critical status

These two fields have two values each of them, minimum and maximum. Configuring them correctly you will get that some values will show a module as warning status and other ones as critical status:



Critico.jpg



To understand better these options is better to see an example. The CPU module will be always on green in the agent status , so it simply informs of a value between 0% and 100%. If we want that the module of CPU usage will be shown in yellow (warning) when they reached the 70% of its use, and in red when they reached the 90%. We should configure:

  • Warning status:70
  • Critical status:90

With this, when you reach the 90 value, the module will be in red (CRITICAL), and if it is between 70 and 89.99 it will be in yellow (WARNING9, and under 70 in green (NORMAL).

If we have a module with string type you can configure the status using regular expression in Str fields of Warning Status and Critical Status parameters. For example we have a module that returns OK, ERROR connection fail or BUSY to much devices, depending on the query result.

To configure the WARNING and CRITICAL module status we will sue the following regular expressions:

Warning Status: .*BUSY.*
Crirical Status: .*ERROR.*

You must take care because this regular expressions are case sensitive. With this module configuration the status will be WARNING when the data contains the string BUSY and the status will be CRITICAL when the data string contains ERROR.

If by any chance both status are configured with the same values, the critical value will have preference, this is, the Warning state will be never reached so the Critical state is more important.

This is an example of modules in each of the status:



Colorin.jpg


It is obvious that these fields have no sense for modules that only return boolean values (1 or 0).

These values are shown in the main screen of the monitor view, and you could know with a quick look how many checks are in Normal, Warning or Critical status.

1.1.4 Other Common Monitoring Parameters

1.1.4.1 Historical



Historicaldata.png



Pandora FMS allows to keep (optionally) the historical of any data, in an individual way. By default all modules keep an historical(so they could do graphs, include them in reports kind historical/evolutive, etc). But in a very big implantation that needs to monitor many data, it could be possible that you do not need to keep the historic from some data, allowing this way to use less resources.

This option allows to deactivate the historical of those modules where you do not need to keep an historical. Even if you deactivate the historical, the alerts will continue working exactly the same, same as the event generation and the view of the current state of this monitor.

1.1.4.2 FF Threshold



Fft.png



The FF Threshold parameter (FF=FlipFlop) is used to "filter" the continuous changes of state in the creation of events/status, so you can indicate to Pandora FMS that until an element is not at least X times in the same status after having changed from an original status, it will not consider as if has changed. Lets see a classical example: one ping for a host where there is loss of packages. In an evironment of this kind, it could give results as these:


1
1
0
1
1
0
1
1
1

However, the host is alive in all cases. What we really want to say to Pandora is that until the host does not say that is at least three times down, it does not show it as this, so in the previous case it would never be as down, and it would only be this way in this case:

1
1
0
1
0
0
0

From this point, it will show it as down, but no before.

So the FLip_Flop protections is useful to avoid these disturbing fluctuations. All modules implement it and its use is to avoid the change of status ( limited by its defined limits or automatic limits, as in the case of the *proc modules).

Go back to Pandora FMS documentation index