-
ALert and or Agent relationships
One of the features I am missing in Pandora, which I have used in other monitoring solutions is the ability to form relationships between various objects.
The normal use for these relationships is to prevent being flooded with events. I can not see such a mechanism in Pandora so would like to propose the following as an enhancement request. I would do it myself but it involves changing the database schema so I thought I’d float it here first.
I appreciate that what I am proposing might not suit everyone so would appreciate bounding the idea around a bit first.
Phase 1. – Establishing relationships between alerts on the same agent.
Every alert is assigned a priority. If when checking the monitors, if an alert is going to fire, check to see whether any higher priority alerts are already in the ‘fired’ state for that agent. If so, don’t perform the alert action but set the alert to fired.Say I have a ‘icmp_proc’, some ‘snmp_inc’s and a couple of ‘tcp_proc’ modules associated with an agent, and I make the icmp alert the highest priority and the alerts for the others all lower. If the agent is down, the icmp_proc alert will fire and I will get told about it. However, the others will also all fire as the agent is down, but I already know that so I don’t need the alerts telling me so.
I haven’t thought through what to do with the recovery end but an idea would be that when an alert ceases, it also ceases all lower priority ones for the same agent and does not send the emails. If any module is still in a fault state, its alert will fire again shortly when the next test is done.
Phase 2 – establishing relationships between agents.
This one can get a bit tricky but I like the nagios approach of establishing trees i.e. each agent can have a parent. The idea here is similar to that in phase 1, but exactly how to use it, I am still thinking about. The idea though is to prevent getting alerts for agents that have a common root. i.e. if a router is down, then all the network based alerts for things on the other side of it are going to fire. I don’t want to get hundreds of alerts when I know the router is down.Anyway, I would be interested in what opinions others might have.