Welcome to Pandora FMS Community › Forums › Community support › Missing Network polls
-
Missing Network polls
Posted by Raul77 on February 21, 2008 at 13:09Hi
I have a problem with the network server. I’ve implemented a pandora system in order to obtain data from cisco routers and servers. With the agent in linux or windows pandora works fine, but with the routers works in a jerky way.
For the routers I am using icmp and snmp monitors.
When all routers are up, there is no problem and all monitors respond to the server. But if one router fall, then all the monitor system for the routers becomes unstable losing many of the polls on the routers that are working fine. It seems as the polling system has timeouts waiting for the router fallen and afects to the rest of the system.
We apreciate any comment and congratulations for the system.Sancho replied 16 years, 11 months ago 3 Members · 12 Replies -
12 Replies
-
::
Hi
I have a problem with the network server. I’ve implemented a pandora system in order to obtain data from cisco routers and servers. With the agent in linux or windows pandora works fine, but with the routers works in a jerky way.
For the routers I am using icmp and snmp monitors.
When all routers are up, there is no problem and all monitors respond to the server. But if one router fall, then all the monitor system for the routers becomes unstable losing many of the polls on the routers that are working fine. It seems as the polling system has timeouts waiting for the router fallen and afects to the rest of the system.
We apreciate any comment and congratulations for the system.You have a lot of server parameters to tune up this behaviour. 1.3.1 version has introduced a lot of stuff there. I recommend to try 1.3.1, is not a finished version, but has tested since many weeks and it’s very stable.
http://www.openideas.info/wiki/index.php?title=Pandora_1.3:Documentation_en:1.3.1_new
Get a tarball with a snapshot of 1.3.1 version at:
http://artica.homelinux.com/pandora_tarball/1.3.x/?C=M;O=D
Some quick advices:
– Set global timeout to 3 or 4 secs.
– Set a higher ammount of threads. Aprox. 10 threads for each 100 modules.
– Use new functionalities of snmp_timeout and snmp_checks of 1.3.1 version. SNMP has more timeout than ICMP -
::
Thank you for the quick response.
I’ve just tried the new version of the server. I’ve configured also the database conection and the params relative to the timeouts and the threats. Although the server started well, it go down in a few minutes. So i’ve had to go back and return to the 1.3 version. Have I done somthing wrong?My configuration file that i used at the 1.3.1 for the pandora_server.conf is:
#############################################################################
# Pandora Server Parameters, please change it for your setup needs
############################################################################## Servername: Name of this server
# if not given, it takes hostname. It’s preferable to setup one
# because machine name could change by some reason.#servername endor
# incomingdir: Defines directory where incoming data packets are stored
# You could set directory relative to base path or absolute, starting with /incomingdir /var/spool/pandora/data_in
# log_file: Main logfile for pandora_server
# You could set file relative to base path or absolute, starting with /log_file /var/log/pandora/pandora_server.log
# Log file for Pandora FMS SNMP console. Its generated by NetSNMP Trap daemon
snmp_logfile /var/log/pandora/pandora_snmptrap.log
# Error logfile: aux logfile for pandora_server errors (in Daemon mode)
# You could set file relative to base path or absolute, starting with /errorlog_file /var/log/pandora/pandora_server.error
# dbname: Database name (pandora by default
dbname pandora
# dbuser: Database user name (pandora by default)
dbuser pandora
# daemon: Runs in daemon mode (background) if 1, if 0 runs in foreground
# this could be setup on command line with -D option# daemon 1
# dbpass: Database password
dbpass XXXXXX
# dbhost: Database hostname or IP address
dbhost localhost
# verbosity: level of detail on errors/messages (0 default, 1 verbose, 2 debug…. 10 noisy)
# -v in command line (verbose) or -d (debug)verbosity 1
# Alert threshold
alert_threshold 45
# Master Server, 1 if master server (normal mode), 0 for slave mode (slave in multi-server setup)
master 1
# Check datafiles using a MD5 hash, 1 to check (default), 0 to ignore .checksum
checksum 0
# Activate Pandora SNMP console (depending on snmptrapd)
# only available on Pandora Network serversnmpconsole 1
# Activate (1) Pandora Network Server
networkserver 1
# Activate (1) Pandora Data Server
dataserver 1
# Activate (1) Pandora FMS Recon server
reconserver 1
# Network timeout (in seconds) for timeout in network connections for Network agents
network_timeout 4
# Server keepalive (in seconds)
server_keepalive 50
# Server Threshold: defines number of seconds of main loop (in sec)
server_threshold 15
network_threads 200
# icmp_checks x : defines number of pings for each icmp_proc module type. at least one of that ping should be 1 to report 1
icmp_checks 1# alert_recovery 1 | 0 : Defines if Pandora FMS launch another alert when alert condition is recovered. It
# has the same field1, but adds “[RECOVER]” to field2 and field3. Is disabled by default.alert_recovery 1
# tcp specific options :
# tcp_checks: number of tcp retries if first attempt fails.
# tcp_timeout: specific timeout for tcp connectionstcp_checks 2
tcp_timeout 5# snmp specific options :
# snmp_checks: number of snmp request retries if first attempt fails.
# snmp_timeout: specific timeout for snmp request.snmp_checks 2
snmp_timeout 5# snmp_proc_deadresponse 1 (default): Return DOWN if cannot contact
# or receive NULL from a SNMP PROC module.snmp_proc_deadresponse 1
-
::
Thank you for the quick response.
I’ve just tried the new version of the server. I’ve configured also the database conection and the params relative to the timeouts and the threats. Although the server started well, it go down in a few minutes. So i’ve had to go back and return to the 1.3 version. Have I done somthing wrong?Seems to have a problem with daemon startup. Probably some error:
1. Check logs (/var/log/pandora/)
2. Start daemon manually:
pandora_network /etc/pandora/pandora_server.conf
3. Start process and check if it is alive with ps aux | grep pandora -
::
Hi
I’ve restarted the services and in the log files I’ve found this
/etc/pandora/pandora_server.conf
2008/02/26 10:49:39 fontroja3_Data Launching Pandora FMS Data Server 1.3.1GX PS080210
2008/02/26 10:49:53 fontroja3_SNMP Launching Pandora FMS SNMP Console 1.3.1GX PS080210
2008/02/26 10:49:53 fontroja3_SNMP [V1] New SNMP logfile detected, resetting index
2008/02/26 10:49:54 fontroja3_Net Pandora FMS Network Server Shutdown by signal
2008/02/26 10:49:54 fontroja3_Net Launching Pandora FMS Network Server 1.3.1GX PS080210
2008/02/26 10:49:56 fontroja3_Data Launching Pandora FMS Data Server 1.3.1GX PS080210
2008/02/26 10:49:58 fontroja3_Recon Launching Pandora FMS Recon Server 1.3.1GX PS080210
2008/02/26 10:51:38 fontroja3_SNMP [V1] Server fontroja3_Net going DownIt seems as if there is a problem with the SNMP server instead of the Network.
I’ve readed the log and :
pandora_snmptrap.log
Warning: no access control information configured.
This receiver will *NOT* accept any incoming notifications.
NET-SNMP version 5.4.rc2What this mean? , there is a problem with “NET-SNMP version 5.4.rc2”
Thank you very much for the help
-
::
I think it is an snmp configuration problem (read this:
http://www.mail-archive.com/[email protected]/msg12499.html
Raúl
-
::
hi
I’ve try to solve the problem with snmp. An now, the error at pandora_snmptrap.log havent appeared but the pandora_network_server has stopped again. I tried to put the log of the pandora_server in “noisy” and there are a few lines from it:2008/02/26 15:55:31 fontroja3_SNMP [V5] EVENT Insertion: INSERT INTO tevento (id_agente, id_grupo, evento, timestamp, estado, utimestamp) VALUES (0, 0, ‘Server fontroja3_Net going Down’, ‘2008-02-26 15:55:31’, 0, 1204037731)
2008/02/26 15:55:31 fontroja3_SNMP [V1] Server fontroja3_Net going Down
2008/02/26 15:55:36 fontroja3_Recon [V10] Loop in Recon Module SubsystemAs you can see the server again comes down and there is little from the logs that could explain the error.
If you do not mind i think i shold try to explain again the problem: at the version 1.3 i have a lot of icmp and snmp polls lost when one icmp or snmp managed agent goes down. And now i am tring to run the 1.3.1 version of the program but maybe there is another way to solve the problem, because o dont think the problem is in your 1.3 system.Thanks very much for you help.
-
::
hi
I’ve try to solve the problem with snmp. An now, the error at pandora_snmptrap.log havent appeared but the pandora_network_server has stopped again. I tried to put the log of the pandora_server in “noisy” and there are a few lines from it:
Thanks very much for you help.Mmm. Try to execute pandora_network process WITHOUT the daemon launcher, that is for example
/usr/bin/pandora_network /etc/pandora/pandora_server.conf
Logs are by default on /var/log/pandora_server.error not pandora_snmptrap.log (this is internal file to log all traps received, another different issue).
I think you have a problem that causes your server goes down at startup and we’re missing the problem in startup.
I hope this helps…
-
-
::
althoug there is the message “segmentation fault” at the end of the execution, the process continues working. The error was at the starting scripts as you have told.
In that point i will try to modify the timeouts and treath params and see what happen when one of the routers being managed is going down. -
::
I’ve just done it, and here is the last lines of the result:
- Starting up Network Consumer Thread # 99
- All threads loaded and running
- Starting up Network Producer Thread …
Segmentation fault
Damm… 100 Threads is A LOT. Try to set to 20 or 30 first. You don’t need so much believe me (We have tested it with more than 2000 modules and not more than 40 threads and go fast enough).
-
-