Some agent drops that occur without any apparent reason, can be explained because something is happening in the way Pandora FMS works, specifically at /etc/init.d/pandora_agent_daemon
.
In this case, the problem has been detected in systems with a virtualization infrastructure based on Virtuozzo®, where both the hardware nodes and the virtual machines that run above are monitored. As Virtuozzo® does Software virtualization, the processes than run on virtual machines could perfectly be seen doing one ps on the hardware node.
[root@xxxxxxx ~]# ps -Af | grep pandora_agent root 20643 17147 0 Aug25 ? 00:00:08 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora root 22665 20918 0 Aug25 ? 00:00:08 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora root 22862 21033 0 13:13 pts/5 00:00:00 grep pandora_agent root 23518 1 0 Aug25 ? 00:00:12 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora root 24839 31440 0 Aug25 ? 00:00:08 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora root 26703 24727 0 Aug25 ? 00:00:07 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora root 28677 16959 0 Aug25 ? 00:00:08 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora root 29524 301 0 Aug25 ? 00:00:08 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora
When extracting the PID of the pandora_agent
through one ps for executing a kill for example- what happens is that we obtain the identifier from the first agent listed between all those that are running. In statistics it’s hardly ever the one that we want-the one corresponding to the hardware node.
When executing stop or restart on agents in the hardware nodes, in fact, we’ve been applying this order to an unspecified agent on one of the virtual machines that runs above. When the stop command is fired on one machine, there is no problems, because the ps can only see their own processes.
The solution is simply to rename /etc/pandora por /etc/pandora_virt
and to edit the file /etc/init.d/pandora_agent_daemon
and replace the following line:
PANDORA_PATH=/etc/pandora
with
PANDORA_PATH=/etc/pandora_virt
With this, when you start and stop the process on the “real” machine, this will not affect the virtual machine.