Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unreachability and notifications to many
#7
Quote:About notifications I mean that best solution is to group actions in one group. For example I have a group "notify-administrators" and then I have attached all my notifications to that group. So if we hire another person I can modify one place. It is not flexible current way. About default actions for alert template - fine, but I can't add multiple default actions for a alert template or am I missing something?

You always can put a list of people in the field used to send an email ([email protected], [email protected]) and use that action like a group. I don't sure that works, but try and if not, open a bug to be able to send a mail to multiple users internally. I think this is a good option. About the multiple defaults, yes, its a limitation to have only one default action at this time Sad

Quote:[p]I hope it shows properly. So in that case if:[/p][p]- everything is UP (I mean CORE, A, B, C, D) and X is down it means that it is state DOWN
- everything is UP and X is up then it is in state UP
- A and C are down and also X is down - it should be in state DOWN, because clearly something is wrong with that switch because it has alternative path via D
- A, B, C, D are down and also X is down - it should be in state UNREACHABLE (it has both its parents down so we assume it is the reason)
- A, B, C, D are down and X is up - it means that those down switches have something wrong with their administration (for example misconfigured vlan 1), but in that situation X should be in state UP
- A, B are down, C and D are up, x is UP - it should be in state UP (same situation as earlier)
- A, B are down, C and D are up, x is DOWN - it should be in state DOWN
- A, B are up, C, D and X are down - it should be in state UNREACHABLE[/p][p]So to put it in simple words - if every parent of host X is in state DOWN or UNREACHABLE and X is down then it means it is UNREACHABLE.
If at least one parent of host X is in state UP and host X is down then it means that it is DOWN[/p][p]That's it. It always worked for me that way and if anybody have some more complicated situations - for example webapp depends on router and also on service "proxy" on other server then you can add manual correlation between services.[/p][p]So it would be great if there was another state of a host - UNREACHABLE and person who uses Pandora would decide how to treat that state - as unreachable or as down. It would make Pandora way more flexible.[/p][p]Thanks![/p]

Very very intesting. This is really complex to implement, I figure that its not easy to setup in other tools. In pandora you can do using correlation, BUT individual alerts will fire, even they donesn't execute no actions. You need individual alerts to be defined prior to create a correlation rule/alert.

You can manage your UNREACHABLE status with our WARNING status, and get the NORMAL and CRITICAL like UP or DOWN. We have another state, UNKNOWN for system that you cannot contact but don't know if are up or down, sounds like your UNREACHABLE, but this state is induced from different conditions not for an active check, so it's complex to manage, because you cannot know exactly when a device is Unknown.

Some days ago, after this converstation I decided to implement some kind of "cascade protection" system for alerting system:

This option is designed to avoid a "storm" of alerts coming because a group of agents are unreachable. This kind of behaviour happen when an intermediate device, as for example a router, is down, and all devices behind it are just not reachable, probably that devices are not down and even that devices are working behind another router, in HA mode, but if you don't do nothing probably Pandora FMS thinks they are down because cannot remotely test it with a Remote ICMP Proc test (a ping).

When you enable cascade protection in an agent, this means that if it's parent has a CRITICAL alert fired, then the agent alerts WILL NOT BE fired. If agent's parent has a module in CRITICAL or several alerts with less criticity than CRITICAL, alerts from the agent will be fired if should be. Cascade protection checks parents alerts with CRITICAL criticity, including the correlation alerts assigned to the parent.

If you want to use an advanced cascade protection system, just use correlation between sucesive parents, and just enable the Cascade Protection in the children.

Using this combined with correlation alerts, could result in a very flexible alerting system, not too heavy to administrate. I suggest to try to use our current 3.0 development version, things are more complex to explain in messages than to test and check it personally :-)
 Reply
Messages In This Thread
Unreachability and notifications to many - by slerena - 09-01-2009, 06:55 AM


Users browsing this thread: 1 Guest(s)


(c) 2006-2018 Artica Soluciones Tecnol├│gicas. Contents of this wiki are under Create Common Attribution v3 licence. | pandorafms.com | pandorafms.org

Theme © MyBB Themes