Event correlation alerts: customize your alerts and make your job easier
This post is also available in : Spanish
One of the Pandora FMS functions we’re most pleased with, here at Pandora FMS HQ, is the events console, our intuitive interface that tracks incidents, facilitates teamwork and allows users to take appropriate action. But, let’s not forget another events-related functionality: event correlation alerts.
If we’re operating in an IT eco-system with thousands of active checks, it can be a painstaking and time-consuming chore establishing individual alerts for each one (ask anyone using Nagios ;)). Policies have taken some of the tedium out of monitoring, but there are other options to hand, such as the topic of this article: event correlation alerts.
Group generic alerts
Wouldn’t it be easier to establish a generic alert for all your production routers? So any router whose connection is interrupted immediately alerts you via email? Sound good? And what if you incorporate new routers and you don’t have to bother about configuring new alerts? Read on for the skinny….
Generic cluster alerts
The same concept can be applied to many situations, e.g. a dynamic cluster of web services, which alerts you when any node is overloaded, doesn’t matter which one.
But that’s not all! You can also establish alert combination logic, meaning you get a heads up when, for example, two critical events happen simultaneously or contiguously. Or when various critical latency events appear at the same time, or almost.
All is possible thanks to event correlation alerts, whose trigger conditions aren’t like the usual Pandora FMS alerts which do, in their case, have to be defined 1:1.
Let’s go into a little more depth in the article proper.
Firstly, let’s get reacquainted with our old amigo, the events console, whose task is to provide an intuitive, visual display of the latest goings-on on your monitoring system.
Whether it’s changes in status, new agent creation, or alert triggering, new circumstances are reflected on the events console.
Events contain detailed information about whatever it was that generated the event, such as: the element in question (agent, alert, module…), the exact moment the incident was produced (date, time, hour, minute…), the criticalness or severity of the event, and furthermore allows you to access the history, graphs and all information associated with the element that triggered the event.
Apart from containing information, events allow you to interact with both systems and your operations group, or others, via comments:
There’s also a little something we like to call manual remote actions, which can be carried out on events, like assigning or changing an owner, modifying a status or pinging a host to check it is mounted. These options cam be extended and customized to integrate them with other apps.
The events console displays all the information sysadmins dream of, and more, as you can see and believe. Now let’s move on to take a look at how to work with an event correlation and alerts system.
Event correlation alerts operation overview
Events-based alerts, consisting of automatically executing a series of actions in function of a conjunction of rules applied to generated events.
Standard Pandora FMS alerts are individually associated to modules and define the triggering conditions, which usually take into account the module status or the data they themselves collect. The procedure is different in the case of event correlation alerts.
Alert triggering is carried out based on a series of filters, or configurable rules. Using these filters it’s possible to choose what kind of event you want to trigger the alert, find out which agent generated the event or even the textual content.
The fields selected when you set your filters are compared with events that are generated and, if coincidences are found, an alert will be triggered. For example, if you create a filter to warn you about any critical event on “Host Alive” modules, you effectively create an alert for an outage on any system.
You can use different filters with various options for the same alert, meaning that a range of events can trigger an alert, allowing full correlation between events and alerts, useful in cases where you want a specific alert for simultaneous or repeated fails, e.g., if a system crashes several times a day, or various cluster nodes display critical failage during a specified window.
Furthermore, the event alerts system is handy for warning of inventory changes, unauthorized hardware modifications or non-aligned software installations.
Creating events-based alerts is uncomplicated and allows for great flexibility, like regular alerts. To create, access the event alerts section, and click Create to fill out a field similar to the regular alerts templates.
Indicate the parameters (alert name, default action, alert time range) on the first form. You can restrict the alert to weekdays, 9 to 5, whatever works:
Obey the wizard while it walks you through and you’ll see your alerts list, as and how you’ve set it up:
Next, define the filters that will trigger the alert by clicking: E.g., you want an alert to trigger when a “Network traffic” module reads warning:
There is a way to stretch the event alerts system even further, by creating automated script-based events with API or CLI, to help with unknown unknowns, such as event creation based on TRAPS SNMP, SMS text content events, or Telegram messaging, and third-party integration developments.