Welcome to Pandora FMS Community › Forums › Community support › Intermittent SNMP errors
-
Intermittent SNMP errors
Posted by jncraton on September 15, 2008 at 12:52I recently installed and began testing Pandora as a replacement for Cacti and Nagios. I am currently using it to monitor several servers as well as our main router. It seems to be working fine, except that the data returned via SNMP for the bandwidth (inoctects/outoctects) often returns zero instead of the proper value. For example, here is some of the output from our router:
2008-09-15 13:45:58 4.18 M 2008-09-15 13:40:42 4.39 M 2008-09-15 13:35:27 0 2008-09-15 13:30:12 4.37 M 2008-09-15 13:24:57 4.53 M 2008-09-15 13:19:42 0 2008-09-15 13:14:27 5.13 M 2008-09-15 13:09:12 0 2008-09-15 13:03:57 4.66 M 2008-09-15 12:58:42 3.71 M 2008-09-15 12:53:27 0 2008-09-15 12:48:12 4.74 M 2008-09-15 12:42:57 4.98 M
I am currently running the Pandora VMWare image (v1.3.1) on an ESX server. I’m not sure if this is specifically a Pandora problem as errors tend to be much less frequent when bandwidth usage is lower, but any help with this issue would be a appreciated.
Thanks in advance.
jncraton replied 16 years, 5 months ago 2 Members · 8 Replies -
8 Replies
-
::
Try to raise a bit the SNMP Network timeout in /etc/pandora_server.conf. This seems to be a problem contacting with your SNMP device, probably some delays in connection.
If this don’t work (I’m almost sure that this will solve your probel), use snmp_checks 2 instead 1 (default).
Default values are:
snmp_checks 1
snmp_timeout 3First try with:
snmp_checks 1
snmp_timeout 6If don’t work, try with
snmp_checks 2
snmp_timeout 5I hope this help.
-
-
::
How many hops does have SNMP devices that reports missings connections ?. Are in a high latency network ?, it’s possible you have connection cuts on them ?.
Put a ICMP latency module on that devices with the same interval you’re using on SNMP modules. What interval is it ?.
After doing that and get some info, try to raise snmp_timeout to 15. I have exprienced problems with some SNMP agents asking too low to repeated snmp polling (some checkpoint equipment and older switches with a high load).
-
::
The device that I am monitoring is our router. There are no ip hops between the pandora server and this device. It does go through several high traffic switches, but these devices are far from exceeding their maximum capacity.
I set up an ICMP monitor on the same interval as the SNMP monitoring (300). It generally returned under 2ms but very rarely would hit as high as 100ms or 0 which I believe indicates a timeout. We are also monitoring this device using Nagious and Cacti, and they do not appear to exhibit similar problems.
I bumped up the SNMP timeout to 15ms as you suggested, but that does not seem to have had any effect.
Here is the result of running `ping -f` from the pandora server to the router that I am trying to monitor
— 10.xx.xx.xx ping statistics —
288096 packets transmitted, 288095 received, 0% packet loss, time 201441ms
rtt min/avg/max/mdev = 0.000/0.649/66.694/0.782 ms, pipe 4, ipg/ewma 0.699/0.542 ms -
-
-
::
This is really weird.
Some SNMP and task concurrency issues has been reported in 1.3.1 and fixed in 2.0 but only affects to heavy loaded systems. How many modules do you have in your setup ?. How many of them are network modules ?
One more possible solution; Try to lower the network threads you have.
-
::
I am currently monitoring about 10 servers using ICMP to track uptime and SNMP to monitor link status and traffic on each NIC. I am also monitoring the traffic through our router using SNMP. This is probably between 50 and 100 modules in total.
I tried setting network_threads to 1, but that didn’t solve the problem.
EDIT:
I upgraded to version 2.0beta1 and I am still experiencing the same issue.