Pandora: QuickGuides EN: Alert configuration

From Pandora FMS Wiki
Revision as of 14:29, 20 January 2017 by Steve alvey (talk | contribs) (Pandora FMS Alert Configuration Quick Guide)
Jump to: navigation, search

Go back to Quick Guides index

1 Pandora FMS Alert Configuration Quick Guide


Esquema-alert-structure.png


Alerts comprise:

  • Commands
  • Actions
  • Templates

The command defines the operation or the final action or executes when the alert is triggered. Examples of commands may be: note in a log, send an email or SMS, execute a script or program, etc.

An action relates a command with a template and permits customizable command executions via three generic parameters, Field1, Field2 and Field3. These parameters allow you to customize the execution of the command as they are the ones that will pass when executed as entry parameters.

In templates the conditions for triggering an alert are defined, and if there will also be a recoup action, and the default executable action.

  • Trigger conditions: are the conditions under which an alert is triggered, e.g. exceeding a threshold, registering critical status, etc. May be found in template
  • Triggered actions: are always associated with a command, and allow the execution of commands to be customized and to send arguments to the command via Field1, Field2, etc.
  • Alert recoup: configure actions to be carried out when the system is recovering from an alert and is back to normal status.

1.1 Data flow in the alerts system

When defining actions and templates there are some generic fields available (Field 1, Field 2, etc.) that will be the entry parameters in the command execution. The values of these parameters spread from the template to the action and finally to the command. The propagation of template to action only occurs if the corresponding action field has no assigned value. If the action has an assigned value it stays the same, maintaining hierarchy over the field which would inherit from the template if it were empty. E.g. 1: in Field1 and Field2 the templates contain content, but NOT in the action. The action inherits the content for its own Field1 and Field2 which become the command parameters. E.g. 2: Field1 and Field2 have content in the template, but ALSO in the action. The action won't inherit the content from Field1 or Field2 from the template, as it has its own content.



Esquema-parameters-carrying.png


This would be an example of how to overwrite the values of the template using those of the action:


Alertas esquema6.png


E.g. create a template to trigger an alert and send an email by using the following fields:

  • Template:
    • Field1: [email protected]
    • Field2: [Alert] The alert was fired
    • Field3: The alert was fired!!! SOS!!!

The values which would reach the command would be:

  • Command:
    • Field1: [email protected]
    • Field2: [Alert] The alert was fired
    • Field3: The alert was fired!!! SOS!!!

For fields 1 and 2 the defined values are maintained on the template, but Field1 will use the values defined in the action.


1.2 Defining a single Alert

Now, suppose we are in the previous case, we have a single necessity: to monitor one module that has numerical values. In our case, it's a module that evaluates the system CPU, in other cases, it could be a temperature sensor that reads the value in degrees Celsius. Let's first make sure that our module receives the data correctly:


Qgcpu1.png

In this screenshot, we can see that we have a module called sys_cpu with a current value of 7. In our case, we want the system to fire an alert when the value becomes greater than 20. For this to occur we're going to configure the module such that it goes to CRITICAl status when it gets higher than 20. For that to happen, click on the adjustable wrench to configure the monitor performance:


Qgcpu2.png

We modify the value selected in red as shown on the following screenshot:


Qgcpu3.png

Agree and save any changes. Now, when the CPU module value goes up to 20 or higher, it will change status to CRITICAL and it will be marked in red, as we can see here.


Qgcpu4.png

The system knows how to recognize when something is right (OK, green color) and when is wrong (CRITICAL, red color). Now, what we want to do is have Pandora FMS send us an email when the module changes to this status. To do so, we will use the Pandora FMS alert system.

The first thing we should do is to make sure that there is at least one command that does what we need it to(to send an email). This example is easy because it's a default command in Pandora FMS to send mails.

1.3 Configuring the Alert

Now, we have to create an action called "Send an email to the operator". Let's do it: go to the menu -> Alerts -> Actions and click to create a new action:


Qgcpu5.png

This action uses the command "Send email" and it's really simple, so you only need to fill in one field (Field 1) and leave the other two empty. This is one of the most confusing parts of the Pandora FMS alert system: What are the fields: field1, field2 and field3?.

These fields are used to "pass" the information from the alert template to the command, so both the Template and the Command can give different information to the command line. In this case, the command only uses field 1, and we leave field2 and field 3 to the template, as we can see below.

Field 1 is the one we use to define the operator's email, in this case, a false mail to "[email protected]".

1.4 Configuring the Template (Alert template)

Now, we have to create an alert template, that should be as generic as possible, in order to use it later. For example, "This is wrong because I have a module in Critical status" and by default have it send an email to the operator. Let's go to the administration menu-> Alerts-> Templates and click on the button to create a new alert template:


Qgcpu6.png


The element that defines the condition is the "Condition" field. In this case, it should be programmed to "Critical status", so this template when associated to a module, will be fired when that associated module goes to critical status. We've configured the "cpu_sys" module previously to turn to critical status when the response becomes 20 or more.

The priority defined here as "Critical" is the priority of the alert, which has nothing to do with the "Critical" status of the module. The criticality of alerts allows us to visualize them in other views, such as the event view, with different identifiers.


Go to step 2 clicking on the "next" button:


Qgcpu7.png

Step 2 defines all of the alert template's "fine" configuration "values" and puts them in the trigger condition. Some of them, the first ones, are quite simple, and they limit the time of action on this alert to some specific days, during a specific time period.

The most critical parameters here are the following:

  • Time threshold: It's established to one day by default. If one module is always down, during, for example an entire day, and we have assigned a 5 minute value, then it means that it would be sending us an alert every 5 minutes. If we adjust it to one day (24 hours), it'll only send us the alert once, when it's triggered. If the module recovers and triggers an alert again, it'll simply re-send the alert again, but if the object remains down from the second fall, then the system won't send us another alert until another full 24 hours have passed.
  • Min. Number of alerts: Minimum number of times that the condition must repeat itself (in this case, that the module would be in CRITICAL status) before Pandora FMS executes the actions associated with the alert template. This is a way to avoid false positives that would "overflow" you with alerts, or that things such as an erratic performance (bouncing) cause many alerts to be fired. If we put 1 here, it means that until it happens at least once, the system won't consider it. If we put 0, the first time the module is triggered, the alert will fire.
  • Max. Number of alerts: 1 means that it will execute the action only once. If we set it to 10, it'll execute the action 10 times. It's a way to limit the number of times an alert can be executed.

Now we have fields "field1, field2 and field3" again. Now we can see that field1 is blank, that is exactly the one that we've defined when we configured the action. Field2 and field3 are used in the action of sending an email to define the message's subject and text, whereas field1 is used to define the recipients for said message (separated by commas). So the template, using some macros, is defining the subject and the message alert as in our case we'll receive a message as the one that follows (supposing that the agent where it's the module is called "Farscape"):

To: [email protected]
Subject: [PANDORA] Farscape cpu_sys is in CRITICAL status with value 20
Texto email:
This is an automated alert generated by Pandora FMS
Please contact your Pandora FMS for more information. *DO NOT* reply to this email.

Given that the default action is the one we have defined previously, all the alerts that use this template will use this predefined action by default, unless it were to be modified.

In the third case, we'll see that it's also possible to configure the alert system in order for it notify when the alert has stopped.


Qgcpu8.png

It's almost the same, but in field1 it's not defined because the same one that was defined in the previously executed action (when the alert was fired). In this case it'll send only an email with a subject that says that the condition in the cpu-syst module has been recovered.

Alert recovery is optional. It's important to say that if in the alert recovery data there are fields (field2 and field3) that are defined, these "ignore and overwrite the action fields, that's to say, that they have priority over them. The only valid field that can't be modified is field1.

1.5 Associating the Alert to the Command

Now, that we have all we need, we only have to associate the alert template to the module. To do this, go to the alert tab in the agent where the module is:


Qgcpu9.png

It's easy. In the following screenshot we can see an already configured alert for a module named "Last_Backup_Unixtime" on the same template that we have defined before as "critical Module". Now, in the controls that are below, we are going to create an association between the module "cpu-sys" and the alert template "Module critical". By default it'll show the action that we've defined on this template "Send email to Sancho Lerena".

1.6 Scaling Alerts

The values that are placed under the "Number of alerts match from" parameter are to define the alert scaling. This allows "redefining" the alert performance a little more, so if we've defined a maximum of 5 times for an alert to be fired, and we only want for it to send us an email, then we should insert a 0 and a 1, to command it to only send us an email from time 0 to 1 (that means once).

Now we see that we can add more actions to the same alert, defining with the "Number of alerts match from" field the alert performance depending on how many times it should be fired.

For example: we want for it to send an email to XXXXX the first time it happens, and if the monitor continues being down, it sends an email to ZZZZ. For it, after associating the alert, in the assigned alerts table, I can add more actions to a previously defined alert, as we can see in the following screenshot:


Qgcpu9.png


Qgcpu10.png

1.7 Standby alerts

Alerts can be enabled, disabled or in standby mode. The difference between the disabled and standby alerts is that the disabled alerts just won't work and therefore will not be shown on the alerts view. Standby alerts will be shown in the alerts view and will work, but only at a display level. It will show if they're triggered or not, but they will not engage in configured actions and will not generate events.

Standby alerts are useful for viewing them without bothering other aspects

1.8 Using Alert Commands different from the email

The email, as a command, is internal to Pandora FMS and can't be configured, which means that field1, field2 and field3 are fields that are defined and are used to define recipient, subject and text of the message. But, what happens if I want a different action that is user defined?

We're going to define a new command, something completely defined by us. Imagine that we want to create a log file with each alert that we find. The format of this log file should be something like:

DATE_ HOUR - NAME_AGENT - NAME_MODULE - VALUE - PROBLEM DESCRIPTION

Where VALUE is the value of the module at this moment. It'll generate several log files, depending on the action that calls to the command. The action will define the description and the file to which the events go to.

For it, first we are going to create a command as follows:


Qgcpu11.png

And we're going to define an action:


Qgcpu12.png

If we take a look at the log that we've created:


2010-05-25 18:17:10 - farscape - cpu_sys - 23.00 - Custom alert for LOG#1

We can see that the alert was triggered at 18:17:10 on the " farscape" agent, on the "cpu_sys" module, with a data of "23.00" and with the description that we chose when we defined the action.

Given the command execution, the field order and other things could cause us to not understand well how the command is finally executed, the easiest thing is to activate the pandora server debug traces (verbose 10) available in the pandora server configuration file /etc/pandora/pandora_server.conf, and restart the server (/etc/init.d/pandora_server restart) after we take a look at the file /var/log/pandora/pandora_server.log looking for the exact line with the alert command execution that we've defined, to see how the Pandora FMS server is executing the command.