Welcome to Pandora FMS Community!

Find answers, ask questions, and connect with our community around the world.

Welcome to Pandora FMS Community Forums Community support Intermittent SNMP errors

  • Intermittent SNMP errors

    Posted by jncraton on September 15, 2008 at 12:52

    I recently installed and began testing Pandora as a replacement for Cacti and Nagios. I am currently using it to monitor several servers as well as our main router. It seems to be working fine, except that the data returned via SNMP for the bandwidth (inoctects/outoctects) often returns zero instead of the proper value. For example, here is some of the output from our router:

    2008-09-15 13:45:58	4.18 M
    2008-09-15 13:40:42	4.39 M
    2008-09-15 13:35:27	0
    2008-09-15 13:30:12	4.37 M
    2008-09-15 13:24:57	4.53 M
    2008-09-15 13:19:42	0
    2008-09-15 13:14:27	5.13 M
    2008-09-15 13:09:12	0
    2008-09-15 13:03:57	4.66 M
    2008-09-15 12:58:42	3.71 M
    2008-09-15 12:53:27	0
    2008-09-15 12:48:12	4.74 M
    2008-09-15 12:42:57	4.98 M
    

    I am currently running the Pandora VMWare image (v1.3.1) on an ESX server. I’m not sure if this is specifically a Pandora problem as errors tend to be much less frequent when bandwidth usage is lower, but any help with this issue would be a appreciated.

    Thanks in advance.

    jncraton replied 16 years, 4 months ago 2 Members · 8 Replies
  • 8 Replies
  • Sancho

    Administrator
    September 15, 2008 at 17:39
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    Try to raise a bit the SNMP Network timeout in /etc/pandora_server.conf. This seems to be a problem contacting with your SNMP device, probably some delays in connection.

    If this don’t work (I’m almost sure that this will solve your probel), use snmp_checks 2 instead 1 (default).

    Default values are:

    snmp_checks 1
    snmp_timeout 3

    First try with:

    snmp_checks 1
    snmp_timeout 6

    If don’t work, try with

    snmp_checks 2
    snmp_timeout 5

    I hope this help.

  • jncraton

    Member
    September 16, 2008 at 09:31
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Thanks for the advice. Unfortunately, it didn’t solve the problem. I now have my pandora_server.conf set like this:

    snmp_checks 2
    snmp_timeout 5

    I am still experiencing the same problems with SNMP. Any other ideas?

  • Sancho

    Administrator
    September 18, 2008 at 06:51
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    How many hops does have SNMP devices that reports missings connections ?. Are in a high latency network ?, it’s possible you have connection cuts on them ?.

    Put a ICMP latency module on that devices with the same interval you’re using on SNMP modules. What interval is it ?.

    After doing that and get some info, try to raise snmp_timeout to 15. I have exprienced problems with some SNMP agents asking too low to repeated snmp polling (some checkpoint equipment and older switches with a high load).

  • jncraton

    Member
    September 19, 2008 at 12:29
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    The device that I am monitoring is our router. There are no ip hops between the pandora server and this device. It does go through several high traffic switches, but these devices are far from exceeding their maximum capacity.

    I set up an ICMP monitor on the same interval as the SNMP monitoring (300). It generally returned under 2ms but very rarely would hit as high as 100ms or 0 which I believe indicates a timeout. We are also monitoring this device using Nagious and Cacti, and they do not appear to exhibit similar problems.

    I bumped up the SNMP timeout to 15ms as you suggested, but that does not seem to have had any effect.

    Here is the result of running `ping -f` from the pandora server to the router that I am trying to monitor

    — 10.xx.xx.xx ping statistics —
    288096 packets transmitted, 288095 received, 0% packet loss, time 201441ms
    rtt min/avg/max/mdev = 0.000/0.649/66.694/0.782 ms, pipe 4, ipg/ewma 0.699/0.542 ms

  • Sancho

    Administrator
    September 21, 2008 at 09:30
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    What SNMP query are you doing to the router ?

  • jncraton

    Member
    September 22, 2008 at 12:47
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    I am doing a query for both in and out octets on port 1 of the router. The OID for that is .1.3.6.1.2.1.2.2.1.10.10101. I don’t seem to have any problems using snmpwalk from the pandora server to the router.

  • Sancho

    Administrator
    September 26, 2008 at 16:14
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    This is really weird.

    Some SNMP and task concurrency issues has been reported in 1.3.1 and fixed in 2.0 but only affects to heavy loaded systems. How many modules do you have in your setup ?. How many of them are network modules ?

    One more possible solution; Try to lower the network threads you have.

  • jncraton

    Member
    September 29, 2008 at 12:43
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    I am currently monitoring about 10 servers using ICMP to track uptime and SNMP to monitor link status and traffic on each NIC. I am also monitoring the traffic through our router using SNMP. This is probably between 50 and 100 modules in total.

    I tried setting network_threads to 1, but that didn’t solve the problem.

    EDIT:

    I upgraded to version 2.0beta1 and I am still experiencing the same issue.