Difference between revisions of "Pandora: QuickGuides EN: Alert configuration"
(→Associating the Alert to the Command)
|Line 197:||Line 197:|
== Scaling Alerts ==
== Scaling Alerts ==
The values that are
The values that are the "Number of alerts match from" are to define the alert scaling. This allows "" a little more, so if wedefined a maximum of 5 times an alert be fired, and we only want it send us an email, then we should 0 and 1, to it only send us an email from time 0 to 1 (that once).
Now we see that we can add more actions to the same alert, defining with
Now we see that we can add more actions to the same alert, defining with "Number of alerts match from" the alert performance depending on how many times it be fired.
For example: we want
For example: we want it an email to XXXXX the first time it happens, and if the monitor continues being down, it sends an email to ZZZZ. For it, after associating the alert, in the assigned alerts table, I can add more actions to a previously defined alert, as we can see in the following screenshot:
Revision as of 10:31, 22 February 2016
- 1 Pandora FMS Alert Configuration Quick Guide
- 1.1 Introduction to the Current Alert System
- 1.2 Defining a single Alert
- 1.3 Configuring the Alert
- 1.4 Configuring the Template (Alert template)
- 1.5 Associating the Alert to the Command
- 1.6 Scaling Alerts
- 1.7 Standby alerts
- 1.8 Using Alert Commands different from the email
1 Pandora FMS Alert Configuration Quick Guide
1.1 Introduction to the Current Alert System
People usually complained about the complexity of defining alerts in Pandora FMS. Until we released version 2.0, where alerts were simpler to configure. For each alert we used to define the condition, and what reaction the alarm provoked when the condition was met or not, in each case. It was more of an "intuitive" thing (but it also had fields such as the "threshold" alert that caused many people headaches). It was very simple, but, was it worth it?
One of our best users (because he had lots of agents installed and managed Pandora FMS really well), mentioned to us that creating an alert for 2000 modules was very difficult, especially when you have to modify something in all of them. Due to this and other problems, we modified the alert system to make it become modular, and separated the definition of the alert firing condition (Alert template) from the action to execute when it is fired (Alert action), and at the same time isolated both from the command that is executed in the action (Alert command). The combination of an alert template with a module triggers the alert.
This way, if I have 1000 systems with a module called "Host alive" and all of them have an associated alert template called "Host down", then an alert called "Call to the operator" will be executed by default, and if I want to change the minimum number of alerts that should be fired before notifying the operator, I will only need to make a change in the definition of the template, instead of modifying 1000 instances.
Several users only manage a few dozen machines, but there are users with hundreds, even thousands of systems monitored with Pandora FMS. This approach makes it possible for Pandora FMS to manage all kind of environments.
1.1.1 Alert structure
An alert is composed by:
A command defines the operation to perform when the alert is fired. Some examples of a command could be: write to a log, send an email or SMS, execute a script or a program, etc.
An action links a command with a template and allows you to customize the command execution using three generic parameters: Field 1, Field 2 and Field 3. These parameters allow you to customize the command execution because they are passed as input parameters in command execution.
On the template you can define the alert's generic parameters which are: firing conditions, firing actions and alert recovery.
- Firing conditions: the conditions under which the alert will be fired, for example: when the data is above a threshold, when the status is critical, etc.
- Firing actions: allows configuring the action that will be performed when the alert is fired.
- Alert recovery: allows configuring the actions that will be performed when the system is recovered after the alert was fired.
1.1.2 Alert system information flow
When you define the actions and the templates you have generic fields called: Field1, Field2 and Field3. They are the parameters passed as input parameters upon command execution. The values of these parameters are propagated from template to action, and then to the command. The propagation value from template to action will only be performed if the defined field in the action lacks any value, otherwise the value is used.
This is an example of how template values are overwritten by the action values.
For example we can create a template that fires an alert and sends an email with the following fields:
- Field1: [email protected]
- Field2: [Alert] The alert was fired
- Field3: The alert was fired!!! SOS!!!
- Field1: [email protected]
The values that will be passed to the command are:
- Field1: [email protected]
- Field2: [Alert] The alert was fired
- Field3: The alert was fired!!! SOS!!!
1.2 Defining a single Alert
Now, suppose we are in the previous case, we have a single necessity: to monitor one module that has numerical values. In our case, it's a module that evaluates the system CPU, in other cases, it could be a temperature sensor that reads the value in degrees Celsius. Let's first make sure that our module receives the data correctly:
In this screenshot, we can see that we have a module called sys_cpu with a current value of 7. In our case, we want the system to fire an alert when the value becomes greater than 20. For this to occur we're going to configure the module such that it goes to CRITICAl status when it gets higher than 20. For that to happen, click on the adjustable wrench to configure the monitor performance:
We modify the value selected in red as shown on the following screenshot:
Agree and save any changes. Now, when the CPU module value goes up to 20 or higher, it will change status to CRITICAL and it will be marked in red, as we can see here.
The system knows how to recognize when something is right (OK, green color) and when is wrong (CRITICAL, red color). Now, what we want to do is have Pandora FMS send us an email when the module changes to this status. To do so, we will use the Pandora FMS alert system.
The first thing we should do is to make sure that there is at least one command that does what we need it to(to send an email). This example is easy because it's a default command in Pandora FMS to send mails.
1.3 Configuring the Alert
Now, we have to create an action called "Send an email to the operator". Let's do it: go to the menu -> Alerts -> Actions and click to create a new action:
This action uses the command "Send email" and it's really simple, so you only need to fill in one field (Field 1) and leave the other two empty. This is one of the most confusing parts of the Pandora FMS alert system: What are the fields: field1, field2 and field3?.
These fields are used to "pass" the information from the alert template to the command, so both the Template and the Command can give different information to the command line. In this case, the command only uses field 1, and we leave field2 and field 3 to the template, as we can see below.
Field 1 is the one we use to define the operator's email, in this case, a false mail to "[email protected]".
1.4 Configuring the Template (Alert template)
Now, we have to create an alert template, that should be as generic as possible, in order to use it later. For example, "This is wrong because I have a module in Critical status" and by default have it send an email to the operator. Let's go to the administration menu-> Alerts-> Templates and click on the button to create a new alert template:
The element that defines the condition is the "Condition" field. In this case, it should be programmed to "Critical status", so this template when associated to a module, will be fired when that associated module goes to critical status. We've configured the "cpu_sys" module previously to turn to critical status when the response becomes 20 or more.
The priority defined here as "Critical" is the priority of the alert, which has nothing to do with the "Critical" status of the module. The criticality of alerts allows us to visualize them in other views, such as the event view, with different identifiers.
Go to step 2 clicking on the "next" button:
Step 2 defines all of the alert template's "fine" configuration "values" and puts them in the trigger condition. Some of them, the first ones, are quite simple, and they limit the time of action on this alert to some specific days, during a specific time period.
The most critical parameters here are the following:
- Time threshold: It's established to one day by default. If one module is always down, during, for example an entire day, and we have assigned a 5 minute value, then it means that it would be sending us an alert every 5 minutes. If we adjust it to one day (24 hours), it'll only send us the alert once, when it's triggered. If the module recovers and triggers an alert again, it'll simply re-send the alert again, but if the object remains down from the second fall, then the system won't send us another alert until another full 24 hours have passed.
- Min. Number of alerts: Minimum number of times that the condition must repeat itself (in this case, that the module would be in CRITICAL status) before Pandora FMS executes the actions associated with the alert template. This is a way to avoid false positives that would "overflow" you with alerts, or that things such as an erratic performance (bouncing) cause many alerts to be fired. If we put 1 here, it means that until it happens at least once, the system won't consider it. If we put 0, the first time the module is triggered, the alert will fire.
- Max. Number of alerts: 1 means that it will execute the action only once. If we set it to 10, it'll execute the action 10 times. It's a way to limit the number of times an alert can be executed.
Now we have fields "field1, field2 and field3" again. Now we can see that field1 is blank, that is exactly the one that we've defined when we configured the action. Field2 and field3 are used in the action of sending an email to define the message's subject and text, whereas field1 is used to define the recipients for said message (separated by commas). So the template, using some macros, is defining the subject and the message alert as in our case we'll receive a message as the one that follows (supposing that the agent where it's the module is called "Farscape"):
To: [email protected] Subject: [PANDORA] Farscape cpu_sys is in CRITICAL status with value 20 Texto email:
This is an automated alert generated by Pandora FMS Please contact your Pandora FMS for more information. *DO NOT* reply to this email.
Given that the default action is the one we have defined previously, all the alerts that use this template will use this predefined action by default, unless it were to be modified.
In the third case, we'll see that it's also possible to configure the alert system in order for it notify when the alert has stopped.
It's almost the same, but in field1 it's not defined because the same one that was defined in the previously executed action (when the alert was fired). In this case it'll send only an email with a subject that says that the condition in the cpu-syst module has been recovered.
Alert recovery is optional. It's important to say that if in the alert recovery data there are fields (field2 and field3) that are defined, these "ignore and overwrite the action fields, that's to say, that they have priority over them. The only valid field that can't be modified is field1.
1.5 Associating the Alert to the Command
Now, that we have all we need, we only have to associate the alert template to the module. To do this, go to the alert tab in the agent where the module is:
It's easy. In the following screenshot we can see an already configured alert for a module named "Last_Backup_Unixtime" on the same template that we have defined before as "critical Module". Now, in the controls that are below, we are going to create an association between the module "cpu-sys" and the alert template "Module critical". By default it'll show the action that we've defined on this template "Send email to Sancho Lerena".
1.6 Scaling Alerts
The values that are placed under the "Number of alerts match from" parameter are to define the alert scaling. This allows "redefining" the alert performance a little more, so if we've defined a maximum of 5 times for an alert to be fired, and we only want for it to send us an email, then we should insert a 0 and a 1, to command it to only send us an email from time 0 to 1 (that means once).
Now we see that we can add more actions to the same alert, defining with the "Number of alerts match from" field the alert performance depending on how many times it should be fired.
For example: we want for it to send an email to XXXXX the first time it happens, and if the monitor continues being down, it sends an email to ZZZZ. For it, after associating the alert, in the assigned alerts table, I can add more actions to a previously defined alert, as we can see in the following screenshot:
1.7 Standby alerts
Alerts can be enable, disable or in standby mode. The difference between the disabled and standby alerts is that the disable alerts just do not work and therefore will not showed in the alerts view. Standby alerts will be showed in the alerts view and work, but only at display level. It will show if are fired or not but will do not engage in configured actions and will do not generate events.
Stanby alerts are useful for viewing them without bothering other aspects
1.8 Using Alert Commands different from the email
The email, as a command is internal to Pandora FMS and can't be configured, that is, field1, field2 and field3 are fields that are defined that are used as receiver, subject and text of the message. But, what happens if I want a different action that is defined by me?
We're going to define a new command, something completely defined by us. Imagine that we want to create a lof file with each alert that we find. The format of this log file should be something like:
DATE_ HOUR - NAME_AGENT - NAME_MODULE - VALUE - PROBLEM DESCRIPTION
Where VALUE is the value of the module at this moment. It'll be several log files, depending on the action that calls to the command. The action will define the description and the file to which the events go to.
For it, first we are going to create a command as follows:
And we're going to define an action:
If we take a look at the log that we've created:
2010-05-25 18:17:10 - farscape - cpu_sys - 23.00 - Custom alert for LOG#1
We can see that the alert was fired at 18:17:10 in the " farscape" agent, in the "cpu_sys" module, with a data of "23.00" and with the description that we chose when we defined the action.
As the command execution, the field order and other things could do that we don't understand well how the command is finally executed, the easiest thing is to activate the debug traces of the pandora server (verbose 10) in the pandora server configuration file /etc/pandora/pandora_server.conf, and restart the server (/etc/init.d/pandora_server restart) and we take a look to the file /var/log/pandora/pandora_server.log looking for the exact line with the alert command execution that we've defined, to see how the Pandora FMS server is firing the command.