Welcome to Pandora FMS Community!

Find answers, ask questions, and connect with our community around the world.

  • Missing Network polls

    Posted by Raul77 on February 21, 2008 at 13:09

    Hi
    I have a problem with the network server. I’ve implemented a pandora system in order to obtain data from cisco routers and servers. With the agent in linux or windows pandora works fine, but with the routers works in a jerky way.
    For the routers I am using icmp and snmp monitors.
    When all routers are up, there is no problem and all monitors respond to the server. But if one router fall, then all the monitor system for the routers becomes unstable losing many of the polls on the routers that are working fine. It seems as the polling system has timeouts waiting for the router fallen and afects to the rest of the system.
    We apreciate any comment and congratulations for the system.

    Sancho replied 16 years, 11 months ago 3 Members · 12 Replies
  • 12 Replies
  • Sancho

    Administrator
    February 21, 2008 at 16:25
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    Hi
    I have a problem with the network server. I’ve implemented a pandora system in order to obtain data from cisco routers and servers. With the agent in linux or windows pandora works fine, but with the routers works in a jerky way.
    For the routers I am using icmp and snmp monitors.
    When all routers are up, there is no problem and all monitors respond to the server. But if one router fall, then all the monitor system for the routers becomes unstable losing many of the polls on the routers that are working fine. It seems as the polling system has timeouts waiting for the router fallen and afects to the rest of the system.
    We apreciate any comment and congratulations for the system.

    You have a lot of server parameters to tune up this behaviour. 1.3.1 version has introduced a lot of stuff there. I recommend to try 1.3.1, is not a finished version, but has tested since many weeks and it’s very stable.

    http://www.openideas.info/wiki/index.php?title=Pandora_1.3:Documentation_en:1.3.1_new

    Get a tarball with a snapshot of 1.3.1 version at:

    http://artica.homelinux.com/pandora_tarball/1.3.x/?C=M;O=D

    Some quick advices:

    – Set global timeout to 3 or 4 secs.
    – Set a higher ammount of threads. Aprox. 10 threads for each 100 modules.
    – Use new functionalities of snmp_timeout and snmp_checks of 1.3.1 version. SNMP has more timeout than ICMP

  • Raul77

    Member
    February 21, 2008 at 21:13
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Thank you for the quick response.
    I’ve just tried the new version of the server. I’ve configured also the database conection and the params relative to the timeouts and the threats. Although the server started well, it go down in a few minutes. So i’ve had to go back and return to the 1.3 version. Have I done somthing wrong?

    My configuration file that i used at the 1.3.1 for the pandora_server.conf is:

    #############################################################################
    # Pandora Server Parameters, please change it for your setup needs
    #############################################################################

    # Servername: Name of this server
    # if not given, it takes hostname. It’s preferable to setup one
    # because machine name could change by some reason.

    #servername endor

    # incomingdir: Defines directory where incoming data packets are stored
    # You could set directory relative to base path or absolute, starting with /

    incomingdir /var/spool/pandora/data_in

    # log_file: Main logfile for pandora_server
    # You could set file relative to base path or absolute, starting with /

    log_file /var/log/pandora/pandora_server.log

    # Log file for Pandora FMS SNMP console. Its generated by NetSNMP Trap daemon

    snmp_logfile /var/log/pandora/pandora_snmptrap.log

    # Error logfile: aux logfile for pandora_server errors (in Daemon mode)
    # You could set file relative to base path or absolute, starting with /

    errorlog_file /var/log/pandora/pandora_server.error

    # dbname: Database name (pandora by default

    dbname pandora

    # dbuser: Database user name (pandora by default)

    dbuser pandora

    # daemon: Runs in daemon mode (background) if 1, if 0 runs in foreground
    # this could be setup on command line with -D option

    # daemon 1

    # dbpass: Database password

    dbpass XXXXXX

    # dbhost: Database hostname or IP address

    dbhost localhost

    # verbosity: level of detail on errors/messages (0 default, 1 verbose, 2 debug…. 10 noisy)
    # -v in command line (verbose) or -d (debug)

    verbosity 1

    # Alert threshold

    alert_threshold 45

    # Master Server, 1 if master server (normal mode), 0 for slave mode (slave in multi-server setup)

    master 1

    # Check datafiles using a MD5 hash, 1 to check (default), 0 to ignore .checksum

    checksum 0

    # Activate Pandora SNMP console (depending on snmptrapd)
    # only available on Pandora Network server

    snmpconsole 1

    # Activate (1) Pandora Network Server

    networkserver 1

    # Activate (1) Pandora Data Server

    dataserver 1

    # Activate (1) Pandora FMS Recon server

    reconserver 1

    # Network timeout (in seconds) for timeout in network connections for Network agents

    network_timeout 4

    # Server keepalive (in seconds)

    server_keepalive 50

    # Server Threshold: defines number of seconds of main loop (in sec)

    server_threshold 15

    network_threads 200

    # icmp_checks x : defines number of pings for each icmp_proc module type. at least one of that ping should be 1 to report 1
    icmp_checks 1

    # alert_recovery 1 | 0 : Defines if Pandora FMS launch another alert when alert condition is recovered. It
    # has the same field1, but adds “[RECOVER]” to field2 and field3. Is disabled by default.

    alert_recovery 1

    # tcp specific options :
    # tcp_checks: number of tcp retries if first attempt fails.
    # tcp_timeout: specific timeout for tcp connections

    tcp_checks 2
    tcp_timeout 5

    # snmp specific options :
    # snmp_checks: number of snmp request retries if first attempt fails.
    # snmp_timeout: specific timeout for snmp request.

    snmp_checks 2
    snmp_timeout 5

    # snmp_proc_deadresponse 1 (default): Return DOWN if cannot contact
    # or receive NULL from a SNMP PROC module.

    snmp_proc_deadresponse 1

  • Sancho

    Administrator
    February 25, 2008 at 01:15
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    Thank you for the quick response.
    I’ve just tried the new version of the server. I’ve configured also the database conection and the params relative to the timeouts and the threats. Although the server started well, it go down in a few minutes. So i’ve had to go back and return to the 1.3 version. Have I done somthing wrong?

    Seems to have a problem with daemon startup. Probably some error:

    1. Check logs (/var/log/pandora/)
    2. Start daemon manually:
    pandora_network /etc/pandora/pandora_server.conf
    3. Start process and check if it is alive with ps aux | grep pandora

  • Raul77

    Member
    February 26, 2008 at 14:18
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    Hi

    I’ve restarted the services and in the log files I’ve found this

    /etc/pandora/pandora_server.conf

    2008/02/26 10:49:39 fontroja3_Data Launching Pandora FMS Data Server 1.3.1GX PS080210
    2008/02/26 10:49:53 fontroja3_SNMP Launching Pandora FMS SNMP Console 1.3.1GX PS080210
    2008/02/26 10:49:53 fontroja3_SNMP [V1] New SNMP logfile detected, resetting index
    2008/02/26 10:49:54 fontroja3_Net Pandora FMS Network Server Shutdown by signal
    2008/02/26 10:49:54 fontroja3_Net Launching Pandora FMS Network Server 1.3.1GX PS080210
    2008/02/26 10:49:56 fontroja3_Data Launching Pandora FMS Data Server 1.3.1GX PS080210
    2008/02/26 10:49:58 fontroja3_Recon Launching Pandora FMS Recon Server 1.3.1GX PS080210
    2008/02/26 10:51:38 fontroja3_SNMP [V1] Server fontroja3_Net going Down

    It seems as if there is a problem with the SNMP server instead of the Network.

    I’ve readed the log and :

    pandora_snmptrap.log

    Warning: no access control information configured.
    This receiver will *NOT* accept any incoming notifications.
    NET-SNMP version 5.4.rc2

    What this mean? , there is a problem with “NET-SNMP version 5.4.rc2”

    Thank you very much for the help

  • raul

    Member
    February 26, 2008 at 16:05
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    I think it is an snmp configuration problem (read this:

    http://www.mail-archive.com/[email protected]/msg12499.html

    Raúl

  • Raul77

    Member
    February 26, 2008 at 19:26
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    hi
    I’ve try to solve the problem with snmp. An now, the error at pandora_snmptrap.log havent appeared but the pandora_network_server has stopped again. I tried to put the log of the pandora_server in “noisy” and there are a few lines from it:

    2008/02/26 15:55:31 fontroja3_SNMP [V5] EVENT Insertion: INSERT INTO tevento (id_agente, id_grupo, evento, timestamp, estado, utimestamp) VALUES (0, 0, ‘Server fontroja3_Net going Down’, ‘2008-02-26 15:55:31’, 0, 1204037731)
    2008/02/26 15:55:31 fontroja3_SNMP [V1] Server fontroja3_Net going Down
    2008/02/26 15:55:36 fontroja3_Recon [V10] Loop in Recon Module Subsystem

    As you can see the server again comes down and there is little from the logs that could explain the error.
    If you do not mind i think i shold try to explain again the problem: at the version 1.3 i have a lot of icmp and snmp polls lost when one icmp or snmp managed agent goes down. And now i am tring to run the 1.3.1 version of the program but maybe there is another way to solve the problem, because o dont think the problem is in your 1.3 system.

    Thanks very much for you help.

  • Sancho

    Administrator
    February 26, 2008 at 19:38
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    hi
    I’ve try to solve the problem with snmp. An now, the error at pandora_snmptrap.log havent appeared but the pandora_network_server has stopped again. I tried to put the log of the pandora_server in “noisy” and there are a few lines from it:
    Thanks very much for you help.

    Mmm. Try to execute pandora_network process WITHOUT the daemon launcher, that is for example

    /usr/bin/pandora_network /etc/pandora/pandora_server.conf

    Logs are by default on /var/log/pandora_server.error not pandora_snmptrap.log (this is internal file to log all traps received, another different issue).

    I think you have a problem that causes your server goes down at startup and we’re missing the problem in startup.

    I hope this helps…

  • Raul77

    Member
    February 26, 2008 at 20:03
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    I’ve just done it, and here is the last lines of the result:

  • Starting up Network Consumer Thread # 99
  • All threads loaded and running
  • Starting up Network Producer Thread …
  • Segmentation fault

  • Raul77

    Member
    February 26, 2008 at 20:14
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    althoug there is the message “segmentation fault” at the end of the execution, the process continues working. The error was at the starting scripts as you have told.
    In that point i will try to modify the timeouts and treath params and see what happen when one of the routers being managed is going down.

  • Sancho

    Administrator
    February 27, 2008 at 13:50
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    I’ve just done it, and here is the last lines of the result:

  • Starting up Network Consumer Thread # 99
  • All threads loaded and running
  • Starting up Network Producer Thread …
  • Segmentation fault

    Damm… 100 Threads is A LOT. Try to set to 20 or 30 first. You don’t need so much believe me (We have tested it with more than 2000 modules and not more than 40 threads and go fast enough).

  • Raul77

    Member
    March 3, 2008 at 16:05
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    hi
    Ive put 40 threads and there arent any problem with the server. The problem with loosing the polls is solved with the new server version. Thank for your help

  • Sancho

    Administrator
    March 4, 2008 at 13:56
    2321 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    hi
    Ive put 40 threads and there arent any problem with the server. The problem with loosing the polls is solved with the new server version. Thank for your help

    Great !!!