Welcome to Pandora FMS Community › Forums › Community support › agent_keepalive alerts not happening
-
agent_keepalive alerts not happening
Posted by steveybaby on January 8, 2008 at 07:32Hello guys – 1st of all want to say thanks for a great bit of software – this does just what I need, and does it well.
The only unsolved problem in my installation is the ability to send an alert when the agent or machine running the agent shuts down. I saw the earlier post about the “bug” where keep alive alerts were being broken by ping alerts – and this is not my situation.
All I have is a handful of proc and data modules assigned to my agent. I created the agent_keepalive alert, and assigned it 0/0 for max/min.
I’ve turned off the Agent and the server and hoped to receive the alert – but it doesnt happen.
Any ideas what I can look at?
alcsmith replied 16 years, 9 months ago 5 Members · 37 Replies -
37 Replies
-
::
Hello guys – 1st of all want to say thanks for a great bit of software – this does just what I need, and does it well.
Thanks ! :). Nice place San Francisco, I was there on holidays !!
The only unsolved problem in my installation is the ability to send an alert when the agent or machine running the agent shuts down. I saw the earlier post about the “bug” where keep alive alerts were being broken by ping alerts – and this is not my situation.
All I have is a handful of proc and data modules assigned to my agent. I created the agent_keepalive alert, and assigned it 0/0 for max/min.
I’ve turned off the Agent and the server and hoped to receive the alert – but it doesnt happen.
Any ideas what I can look at?
There is a bug there. I’m working on it now, because it was a unfinished bug for release 1.3 and another source of problems. Don’t worry, I think this should be easy to fix. I’ll post here the solution in a few hours.
-
-
::
Hello guys – 1st of all want to say thanks for a great bit of software – this does just what I need, and does it well.
Any ideas what I can look at?I’ve just uploaded new code about this issue, it affects to SQL data, data server and console as well, so is not a minor or stable change. Works for me now (keepalive is now a “standard” module) and get inputs from agent contact data (from any source, data server or network server) any server could detect a keepalive under it’s time limit and raise alerts assigned to this module. Also this module could be deleted from console.
If you want to try it I will please to assist you in every I can. You need to access SVN code, and apply changes manually, is not easy if you dont work before with SVN, but it has no technical dificulties, this is officially code for 1.3.1, a minor fix release scheduled for february.
-
-
::
well I did a svn checkout. I replaced the console and server code with the new version. Its still not working – I guess I missed something. What is the “sql data and “data server” changes I need to make?
I did notice that the icon for the keepalive monitor is now broken and I had to delete it from my configured agents. So something had changed
-
::
well I did a svn checkout. I replaced the console and server code with the new version. Its still not working – I guess I missed something. What is the “sql data and “data server” changes I need to make?
I did notice that the icon for the keepalive monitor is now broken and I had to delete it from my configured agents. So something had changed
I’m working on a tarball for 1.3.1 beta. A minimal upgrade is needed (very simple) and no coding skills required. It takes me one day more. I want to fix several problems detected on 1.3.0 and add some minimal new features.
-
::
well I did a svn checkout. I replaced the console and server code with the new version. Its still not working – I guess I missed something. What is the “sql data and “data server” changes I need to make?
I did notice that the icon for the keepalive monitor is now broken and I had to delete it from my configured agents. So something had changed
Last commit for 1.3.1 branch solves all problems with keepalive, and it’s very stable (tested several days on several different environments, some of them with huge load). Take a look at project website about that:
http://pandora.sourceforge.net/en/index.php?sec=downloads
To “migrate” to 1.3.1 you only need to insert a new data in ttipo_module table as described here:
INSERT INTO `ttipo_modulo` VALUES (100,’keep_alive’,-1,’KeepAlive’,’mod_keepalive.png’);
-
::
Dont forget to take a look on:
-
-
-
::
Hey Nil,
I am having a similar problem with the 1.3.1 build. I am too a new user and have been tooling around with the app for about a week now and learning the ins and outs. I have the latest SVN I upgraded from 1.3. Everything dropped in place with no problem but I am having a problem with it sending alerts:1 – In the Manage Alerts I had to change the email Alert from “echo _field3_ | sendmail -s _field2_ _field1_” as sendmail doesn’t recognize the -s. I changed it to “echo _field3_ | mail -s _field2_ _field1_” or you can use mailx either will work.
2 – I have an alert setup to check to see if the agent is down. The min/max value is set to 1. if anything other then 1 send an alert. With a Min Alert of 1 and a Max Alert of 5. Time threshold of 5 min. It was working fine until I restarted the agent in question and the alert’s stopped however I did not get the [RECOVERED] message that is supposed to be delivered.
I take the agent down today and the alerts do not fire.
Any idea what could be the problem?
-Al
BTW Well done on a great product!
-
::
Please, take a look at this topics in the Wiki:
Regards,
Raúl
-
-
-
-
::
I think I may have possibly found the problem. I have had the agent down for 15.6 hours and no alert has fired. I checked the data tab for the agent_keepalive module and it says that it is currently a 0. that would tell me that it thinks the agent is down.
My alert does specifically indicate if the value is anything other then a 1 (agent up) then the alert does not fire. It seems like this this a bug.
-
::
I think I may have possibly found the problem. I have had the agent down for 15.6 hours and no alert has fired. I checked the data tab for the agent_keepalive module and it says that it is currently a 0. that would tell me that it thinks the agent is down.
My alert does specifically indicate if the value is anything other then a 1 (agent up) then the alert does not fire. It seems like this this a bug.
Have you using the current development 1.3.1 version ?, if you’re using 1.3.1 version could you send us a screenshot ?, this has been fixed in 1.3.1 and tested by several users 😕
Grab a tarball from current development of 1.3.x on:
http://artica.homelinux.com/pandora_tarball/1.3.x/?C=M;O=D
If not, take a look (dont forget to run the small migration script (.sql) to upgrade from 1.3.0 to 1.3.1 -it’s very easy-.
http://www.openideas.info/wiki/index.php?title=Pandora_1.3:Documentation_en:1.3_to_1.3.1
-
::
Hi guys,
i have the also the problem that pandora don’t fires any alert if agent is down. I have updated my pandora system to Version 1.3.1 and also made the update in sql.
Are there any other possible mistakes you can make by configuring an alert.
Other alerts for modules like “check host alive” over ICMP are working fine.mfg
Royal TS
-
::
Hi guys,
i have the also the problem that pandora don’t fires any alert if agent is down. I have updated my pandora system to Version 1.3.1 and also made the update in sql.
Are there any other possible mistakes you can make by configuring an alert.
Other alerts for modules like “check host alive” over ICMP are working fine.mfg
Royal TS
Could you describe with more detail the full process to set up it ?. I suppose you have activated alert recovery notification in pandora_server.conf file, isn’t ?
-
::
Could you describe with more detail the full process to set up it ?. I suppose you have activated alert recovery notification in pandora_server.conf file, isn’t ?
Hi Nil,
The recovery notification in the pandora_server.conf is activ and operative
I’ve the following setup in use:
Alert Typ: email
Alert Status: enabled
Min Value: 1
Max Value: 1
Alert text: Agent down
Field #1: email Address
Field #2: _Agent_ down
Field #3: Text
Time From: 0:00, Time to 0:00
Time threshold: 30 min
min Number of Alerts: 1
max Number of Alerts: 3
Assigned Module: agent_KeepaliveI hope i’ve just a little mistake in line of thought 😕
greetz