Missing Network polls

Community support

Missing Network polls

Posted by Raul77 on February 21, 2008 at 13:09

Hi
I have a problem with the network server. I’ve implemented a pandora system in order to obtain data from cisco routers and servers. With the agent in linux or windows pandora works fine, but with the routers works in a jerky way.
For the routers I am using icmp and snmp monitors.
When all routers are up, there is no problem and all monitors respond to the server. But if one router fall, then all the monitor system for the routers becomes unstable losing many of the polls on the routers that are working fine. It seems as the polling system has timeouts waiting for the router fallen and afects to the rest of the system.
We apreciate any comment and congratulations for the system.

Sancho replied 16 years, 11 months ago 3 Members · 12 Replies
12 Replies

Sancho

Administrator
February 21, 2008 at 16:25

2321 Karma points

Community awards: Bright ideas

Community rank: Tentacle Master

Like it
Up
0
Down
Drop it
::
Hi
I have a problem with the network server. I’ve implemented a pandora system in order to obtain data from cisco routers and servers. With the agent in linux or windows pandora works fine, but with the routers works in a jerky way.
For the routers I am using icmp and snmp monitors.
When all routers are up, there is no problem and all monitors respond to the server. But if one router fall, then all the monitor system for the routers becomes unstable losing many of the polls on the routers that are working fine. It seems as the polling system has timeouts waiting for the router fallen and afects to the rest of the system.
We apreciate any comment and congratulations for the system.

You have a lot of server parameters to tune up this behaviour. 1.3.1 version has introduced a lot of stuff there. I recommend to try 1.3.1, is not a finished version, but has tested since many weeks and it’s very stable.

http://www.openideas.info/wiki/index.php?title=Pandora_1.3:Documentation_en:1.3.1_new

Get a tarball with a snapshot of 1.3.1 version at:

http://artica.homelinux.com/pandora_tarball/1.3.x/?C=M;O=D

Some quick advices:

– Set global timeout to 3 or 4 secs.
– Set a higher ammount of threads. Aprox. 10 threads for each 100 modules.
– Use new functionalities of snmp_timeout and snmp_checks of 1.3.1 version. SNMP has more timeout than ICMP
Raul77

Member
February 21, 2008 at 21:13

0 Karma points

Community rank: Tentacle noob

Like it
Up
0
Down
Drop it
::
Thank you for the quick response.
I’ve just tried the new version of the server. I’ve configured also the database conection and the params relative to the timeouts and the threats. Although the server started well, it go down in a few minutes. So i’ve had to go back and return to the 1.3 version. Have I done somthing wrong?

My configuration file that i used at the 1.3.1 for the pandora_server.conf is:

#############################################################################
# Pandora Server Parameters, please change it for your setup needs
#############################################################################

# Servername: Name of this server
# if not given, it takes hostname. It’s preferable to setup one
# because machine name could change by some reason.

#servername endor

# incomingdir: Defines directory where incoming data packets are stored
# You could set directory relative to base path or absolute, starting with /

incomingdir /var/spool/pandora/data_in

# log_file: Main logfile for pandora_server
# You could set file relative to base path or absolute, starting with /

log_file /var/log/pandora/pandora_server.log

# Log file for Pandora FMS SNMP console. Its generated by NetSNMP Trap daemon

snmp_logfile /var/log/pandora/pandora_snmptrap.log

# Error logfile: aux logfile for pandora_server errors (in Daemon mode)
# You could set file relative to base path or absolute, starting with /

errorlog_file /var/log/pandora/pandora_server.error

# dbname: Database name (pandora by default

dbname pandora

# dbuser: Database user name (pandora by default)

dbuser pandora

# daemon: Runs in daemon mode (background) if 1, if 0 runs in foreground
# this could be setup on command line with -D option

# daemon 1

# dbpass: Database password

dbpass XXXXXX

# dbhost: Database hostname or IP address

dbhost localhost

# verbosity: level of detail on errors/messages (0 default, 1 verbose, 2 debug…. 10 noisy)
# -v in command line (verbose) or -d (debug)

verbosity 1

# Alert threshold

alert_threshold 45

# Master Server, 1 if master server (normal mode), 0 for slave mode (slave in multi-server setup)

master 1

# Check datafiles using a MD5 hash, 1 to check (default), 0 to ignore .checksum

checksum 0

# Activate Pandora SNMP console (depending on snmptrapd)
# only available on Pandora Network server

snmpconsole 1

# Activate (1) Pandora Network Server

networkserver 1

# Activate (1) Pandora Data Server

dataserver 1

# Activate (1) Pandora FMS Recon server

reconserver 1

# Network timeout (in seconds) for timeout in network connections for Network agents

network_timeout 4

# Server keepalive (in seconds)

server_keepalive 50

# Server Threshold: defines number of seconds of main loop (in sec)

server_threshold 15

network_threads 200

# icmp_checks x : defines number of pings for each icmp_proc module type. at least one of that ping should be 1 to report 1
icmp_checks 1

# alert_recovery 1 | 0 : Defines if Pandora FMS launch another alert when alert condition is recovered. It
# has the same field1, but adds “[RECOVER]” to field2 and field3. Is disabled by default.

alert_recovery 1

# tcp specific options :
# tcp_checks: number of tcp retries if first attempt fails.
# tcp_timeout: specific timeout for tcp connections

tcp_checks 2
tcp_timeout 5

# snmp specific options :
# snmp_checks: number of snmp request retries if first attempt fails.
# snmp_timeout: specific timeout for snmp request.

snmp_checks 2
snmp_timeout 5

# snmp_proc_deadresponse 1 (default): Return DOWN if cannot contact
# or receive NULL from a SNMP PROC module.

snmp_proc_deadresponse 1
Sancho

Administrator
February 25, 2008 at 01:15

2321 Karma points

Community awards: Bright ideas

Community rank: Tentacle Master

Like it
Up
0
Down
Drop it
::
Thank you for the quick response.
I’ve just tried the new version of the server. I’ve configured also the database conection and the params relative to the timeouts and the threats. Although the server started well, it go down in a few minutes. So i’ve had to go back and return to the 1.3 version. Have I done somthing wrong?

Seems to have a problem with daemon startup. Probably some error:

1. Check logs (/var/log/pandora/)
2. Start daemon manually:
pandora_network /etc/pandora/pandora_server.conf
3. Start process and check if it is alive with ps aux | grep pandora
Raul77

Member
February 26, 2008 at 14:18

0 Karma points

Community rank: Tentacle noob

Like it
Up
0
Down
Drop it
::
Hi

I’ve restarted the services and in the log files I’ve found this

/etc/pandora/pandora_server.conf

2008/02/26 10:49:39 fontroja3_Data Launching Pandora FMS Data Server 1.3.1GX PS080210
2008/02/26 10:49:53 fontroja3_SNMP Launching Pandora FMS SNMP Console 1.3.1GX PS080210
2008/02/26 10:49:53 fontroja3_SNMP [V1] New SNMP logfile detected, resetting index
2008/02/26 10:49:54 fontroja3_Net Pandora FMS Network Server Shutdown by signal
2008/02/26 10:49:54 fontroja3_Net Launching Pandora FMS Network Server 1.3.1GX PS080210
2008/02/26 10:49:56 fontroja3_Data Launching Pandora FMS Data Server 1.3.1GX PS080210
2008/02/26 10:49:58 fontroja3_Recon Launching Pandora FMS Recon Server 1.3.1GX PS080210
2008/02/26 10:51:38 fontroja3_SNMP [V1] Server fontroja3_Net going Down

It seems as if there is a problem with the SNMP server instead of the Network.

I’ve readed the log and :

pandora_snmptrap.log

Warning: no access control information configured.
This receiver will *NOT* accept any incoming notifications.
NET-SNMP version 5.4.rc2

What this mean? , there is a problem with “NET-SNMP version 5.4.rc2”

Thank you very much for the help
raul

Member
February 26, 2008 at 16:05

0 Karma points

Community rank: Tentacle noob

Like it
Up
0
Down
Drop it
::
I think it is an snmp configuration problem (read this:

http://www.mail-archive.com/[email protected]/msg12499.html

Raúl
Raul77

Member
February 26, 2008 at 19:26

0 Karma points

Community rank: Tentacle noob

Like it
Up
0
Down
Drop it
::
hi
I’ve try to solve the problem with snmp. An now, the error at pandora_snmptrap.log havent appeared but the pandora_network_server has stopped again. I tried to put the log of the pandora_server in “noisy” and there are a few lines from it:

2008/02/26 15:55:31 fontroja3_SNMP [V5] EVENT Insertion: INSERT INTO tevento (id_agente, id_grupo, evento, timestamp, estado, utimestamp) VALUES (0, 0, ‘Server fontroja3_Net going Down’, ‘2008-02-26 15:55:31’, 0, 1204037731)
2008/02/26 15:55:31 fontroja3_SNMP [V1] Server fontroja3_Net going Down
2008/02/26 15:55:36 fontroja3_Recon [V10] Loop in Recon Module Subsystem

As you can see the server again comes down and there is little from the logs that could explain the error.
If you do not mind i think i shold try to explain again the problem: at the version 1.3 i have a lot of icmp and snmp polls lost when one icmp or snmp managed agent goes down. And now i am tring to run the 1.3.1 version of the program but maybe there is another way to solve the problem, because o dont think the problem is in your 1.3 system.

Thanks very much for you help.
Sancho

Administrator
February 26, 2008 at 19:38

2321 Karma points

Community awards: Bright ideas

Community rank: Tentacle Master

Like it
Up
0
Down
Drop it
::
hi
I’ve try to solve the problem with snmp. An now, the error at pandora_snmptrap.log havent appeared but the pandora_network_server has stopped again. I tried to put the log of the pandora_server in “noisy” and there are a few lines from it:
Thanks very much for you help.

Mmm. Try to execute pandora_network process WITHOUT the daemon launcher, that is for example

/usr/bin/pandora_network /etc/pandora/pandora_server.conf

Logs are by default on /var/log/pandora_server.error not pandora_snmptrap.log (this is internal file to log all traps received, another different issue).

I think you have a problem that causes your server goes down at startup and we’re missing the problem in startup.

I hope this helps…
Raul77

Member
February 26, 2008 at 20:03

0 Karma points

Community rank: Tentacle noob

Like it
Up
0
Down
Drop it
::
I’ve just done it, and here is the last lines of the result:
Starting up Network Consumer Thread # 99
All threads loaded and running
Starting up Network Producer Thread …

Segmentation fault

Raul77

Member

February 26, 2008 at 20:14

0 Karma points

Community rank:

Tentacle noob

Like it

Down

Drop it

althoug there is the message “segmentation fault” at the end of the execution, the process continues working. The error was at the starting scripts as you have told.
In that point i will try to modify the timeouts and treath params and see what happen when one of the routers being managed is going down.

Sancho

Administrator

February 27, 2008 at 13:50

2321 Karma points

Community awards:

Bright ideas

Community rank:

Tentacle Master

Like it

Down

Drop it

I’ve just done it, and here is the last lines of the result:

Starting up Network Consumer Thread # 99

All threads loaded and running

Starting up Network Producer Thread …

Segmentation fault

Damm… 100 Threads is A LOT. Try to set to 20 or 30 first. You don’t need so much believe me (We have tested it with more than 2000 modules and not more than 40 threads and go fast enough).

Raul77

Member

March 3, 2008 at 16:05

0 Karma points

Community rank:

Tentacle noob

Like it

Down

Drop it

hi
Ive put 40 threads and there arent any problem with the server. The problem with loosing the polls is solved with the new server version. Thank for your help

Sancho

Administrator

March 4, 2008 at 13:56

2321 Karma points

Community awards:

Bright ideas

Community rank:

Tentacle Master

Like it

Down

Drop it

hi
Ive put 40 threads and there arent any problem with the server. The problem with loosing the polls is solved with the new server version. Thank for your help

Great !!!

Welcome to Pandora FMS Community!