Welcome to Pandora FMS Community!

Find answers, ask questions, and connect with our community around the world.

  • Escalation in alerts

    Posted by tnine on June 9, 2009 at 01:00

    Hi all,
    I’m almost done evaluating Pandora vs Zabbix. So far Pandora is a clear winner, as it has a much better user interface than Zabbix. However, Zabbix has one feature that I can’t seem to find in Pandora. I need an escalation scheme so I can go through our on call list. Is there a way to create escalation lists?

    Thanks,
    Todd

    Sancho replied 15 years, 8 months ago 4 Members · 6 Replies
  • 6 Replies
  • Sancho

    Administrator
    June 9, 2009 at 17:45
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    Can you explain what you exactly understand as “Escalation”?, if it’s possible with some samples please.

    Thanks !

  • tnine

    Member
    June 9, 2009 at 17:54
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    No worries. Here’s an example.

    We have 3 groups. On-Call, Admins, Engineers.
    We have a service (on windows), we’ll call it Data Listener.
    Our agent is configured to monitor the process DataListener.exe.

    We want to configure an alert so that if DataListener isn’t running, an alert is sent out. Here is how it would flow.

    Initial Alert

    1. Service stops
    2. Email (which is forwarded to SMS) is sent to goup On-Call
    3. The alert is not acked in 10 minutes, the same message is sent to Admins
    4. The alert is not acked in 10 minutes, the same message is sent to Engineers.

    If at any point the message is acked, the alerts stop.

    Recovery

    Notifies all groups that received an alert email during the initial alert.

    On a completely separate note, my company is sponsoring a project to implement active checks against web resources. We’re looking at attempting to use something like the Selenium IDE to record complex scripts, then use a headless Firefox to run these scripts and integrate the results with Pandora as a plug in. I know version 3 has a basic web checker, but it’s not sufficient for out needs. An actual browser execution will allow companies like mine who use AJAX controls to property check their site is functioning. I’ve joined the dev list, but I’m still waiting on authorization. Would it be possible to approve me so I can get some input from the developer list before I publish the specs for the competition?

  • manu

    Member
    June 10, 2009 at 09:13
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Hi there

    I assume you are using 3.0.

    In that case this is kinda simple, let me see if I explain myself clearly here:

    You just need to define one template with one default action (SMS)

    Then define 2 actions (email to engineers and email to admins)

    In that template set:

    Min number of alerts: 0
    Max: 1

    Threshold: 10 minutes

    Then assign the default template to the modules (remember this default template includes the SMS delivery) then mark the two actions you created: email to engineers and email to admins.

    Voilá, the expected result is gotten.

    1 sms and 1 email to everybody
    After 10 minutes if the alert is still fired, another sms and email to engineers and admins
    If you ack the alert, then you’re done. (you can add a recovery email or sms if you want too to notify the alert is recovered)

    Cheers
    Manuel.

  • tnine

    Member
    June 10, 2009 at 14:20
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Currently I’m on version 2 as I need a stable release for production monitoring. Do you guys have a rough ETA on when 3 will be released?

    Also, what is your process for creating a release? I was browsing the subversion repository, and I noticed that there there are no tags for the release. This is something we need so we can be sure we’re getting the same source on all our servers. Once we validate a version as stable, we need to keep using that version as our systems require 100% up time. Do you guys plan to start tagging in the next release?

  • getnetworks

    Member
    June 10, 2009 at 22:44
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    [cite]Posted By: manu[/cite]
    Hi there

    I assume you are using 3.0.

    In that case this is kinda simple, let me see if I explain myself clearly here:

    You just need to define one template with one default action (SMS)

    Then define 2 actions (email to engineers and email to admins)

    In that template set:

    Min number of alerts: 0
    Max: 1

    Threshold: 10 minutes

    Then assign the default template to the modules (remember this default template includes the SMS delivery) then mark the two actions you created: email to engineers and email to admins.

    Voilá, the expected result is gotten.

    1 sms and 1 email to everybody
    After 10 minutes if the alert is still fired, another sms and email to engineers and admins
    If you ack the alert, then you’re done. (you can add a recovery email or sms if you want too to notify the alert is recovered)

    Cheers
    Manuel.

    Manuel,

    Your reply doesn’t address escalation at all; you simply described how to send the notification to everyone all of the time. tnine only wants the notification to go to the On-Call group on the initial alert. If and only if a second alert triggers 10 minutes later, then send it to both the On-Call group and the Admins group. Finally, if and only if a third alert triggers 10 minutes later (20 minutes after the original alert triggered) send it to all three groups (On-Call, Admins and Engineers). This is a very common escalation process in most established server/network/application management & monitoring applications.

  • Sancho

    Administrator
    June 11, 2009 at 13:07
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    [cite]Posted By: tnine[/cite]
    No worries. Here’s an example.

    We have 3 groups. On-Call, Admins, Engineers.
    We have a service (on windows), we’ll call it Data Listener.
    Our agent is configured to monitor the process DataListener.exe.

    We want to configure an alert so that if DataListener isn’t running, an alert is sent out. Here is how it would flow.

    Initial Alert

    1. Service stops
    2. Email (which is forwarded to SMS) is sent to goup On-Call
    3. The alert is not acked in 10 minutes, the same message is sent to Admins
    4. The alert is not acked in 10 minutes, the same message is sent to Engineers.

    If at any point the message is acked, the alerts stop.

    Recovery

    Notifies all groups that received an alert email during the initial alert.
    On a completely separate note, my company is sponsoring a project to implement active checks against web resources. We’re looking at attempting to use something like the Selenium IDE to record complex scripts, then use a headless Firefox to run these scripts and integrate the results with Pandora as a plug in. I know version 3 has a basic web checker, but it’s not sufficient for out needs. An actual browser execution will allow companies like mine who use AJAX controls to property check their site is functioning. I’ve joined the dev list, but I’m still waiting on authorization. Would it be possible to approve me so I can get some input from the developer list before I publish the specs for the competition?

    Hi tnine.

    You’ve granted to the dev. list right now, I have the aproval mail in the SPAM folder 🙁

    About the escalation, I think Manu try to explain, but a few screenshots will be more easy to understand.

    In this capture, I’ve defined an alert who detects if host is down. This will raise a syslog event the first two times it happens, and if happen more than 2 times (until 4) it will email me. I think is exactly feature you’re looking for. You can let a notification to run togheter other notification or make it not to overlap. Number of “actions” triggered by an alert is infinite, you can have 1 or 200.

    Ack feature is ready since 2.1 version and it’s complementary to this.

    This is for 3.0 version and next versions only, this is a big change in Pandora alerting system.

    About the idea of making a powerful WEB navitagor checker and to integrate in pandora.. GO AHEAD, we will help you in any you need !, please explain yourself in the developer list and let’s try to work together…