Welcome to Pandora FMS Community › Forums › Community support › Advanced troubleshooting › Pandora_Server Process Runaway
-
Pandora_Server Process Runaway
Posted by Murigar on December 16, 2016 at 01:56I have had my Pandora server for many years and been though many successful upgrades.
In the last year or so the Pandora_Server process will now and then act up and consume 100% of CPU resources.
(between every day to every other month)
A reboot will resolve the issue for a non predictable amount of time.
A restart of pandora_server service will not resolve the issue.Within Centos.
#TOP will show 120-200% utilization.
# iostat -x 1 will show basically no utilization.
Memory utilization really does not increase either.It really appears to be strictly the CPU utilization that gets out of hand.
I would appreciate any assistance in troubleshooting this reoccurring issue.
Murigar replied 8 years ago 3 Members · 4 Replies -
4 Replies
-
::
Output of # ./pandora_diagnostic.sh below
Information gathered at 20161215_152819 Linux pandorafms.████████████ 2.6.32-642.11.1.el6.x86_64 #1 SMP Fri Nov 18 19:25:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux ========================================================================= ----------------------------------------------------------------- CPUINFO ----------------------------------------------------------------- processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU      5160 @ 3.00GHz stepping : 6 microcode : 210 cpu MHz : 2992.499 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc arch_perfmon pebs bts tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni ssse3 cx16 x2apic hypervisor lahf_lm dtherm bogomips : 5984.99 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU      5160 @ 3.00GHz stepping : 6 microcode : 210 cpu MHz : 2992.499 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc arch_perfmon pebs bts tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni ssse3 cx16 x2apic hypervisor lahf_lm dtherm bogomips : 5984.99 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ----------------------------------------------------------------- MEMINFO ----------------------------------------------------------------- MemTotal:   13924732 kB MemFree:    13056804 kB Buffers:     148744 kB Cached:     160592 kB SwapCached:      0 kB Active:     562368 kB Inactive:    133996 kB Active(anon):  387192 kB Inactive(anon):   228 kB Active(file):  175176 kB Inactive(file): 133768 kB Unevictable:     0 kB Mlocked:       0 kB SwapTotal:   3096572 kB SwapFree:    3096572 kB Dirty:        44 kB Writeback:      0 kB AnonPages:    387084 kB Mapped:      26084 kB Shmem:       400 kB Slab:       46208 kB SReclaimable:   18372 kB SUnreclaim:    27836 kB KernelStack:    3152 kB PageTables:    10784 kB NFS_Unstable:     0 kB Bounce:        0 kB WritebackTmp:     0 kB CommitLimit:  10058936 kB Committed_AS:  810156 kB VmallocTotal: 34359738367 kB VmallocUsed:   179744 kB VmallocChunk: 34359547264 kB HardwareCorrupted:  0 kB AnonHugePages:  40960 kB HugePages_Total:   0 HugePages_Free:    0 HugePages_Rsvd:    0 HugePages_Surp:    0 Hugepagesize:   2048 kB DirectMap4k:   10240 kB DirectMap2M:  14325760 kB ----------------------------------------------------------------- Other System Parameters ----------------------------------------------------------------- Uptime: 15:28:19 up 2:01, 1 user, load average: 0.61, 0.71, 0.57 ----------------------------------------------------------------- PROC INFO (Pandora) ----------------------------------------------------------------- root   1903 0.0 0.0 145536 7416 ?    SN 13:28 0:02 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora pandora 2343 0.0 0.0 18604 960 ?    Ss 13:28 0:00 /usr/bin/anytermd --port 8023 --user pandora -c telnet %p pandora 2345 0.0 0.0 18604 956 ?    Ss 13:28 0:00 /usr/bin/anytermd --port 8022 --user pandora -c ssh %p root   2388 25.7 2.2 1661464 310408 ?   Ssl 13:28 30:52 /usr/bin/perl /usr/bin/pandora_server /etc/pandora/pandora_server.conf -D root   2393 0.0 0.0 191152 2268 ?    Ss 13:28 0:00 /usr/sbin/snmptrapd -t -On -n -a -Lf /var/log/pandora/pandora_snmptrap.log -p /var/run/pandora_snmptrapd.pid --format1=SNMPv1[**]%4y-%02.2m-%l[**]%02.2h:%02.2j:%02.2k[**]%a[**]%N[**]%w[**]%W[**]%q[**]%vn --format2=SNMPv2[**]%4y-%02.2m-%l[**]%02.2h:%02.2j:%02.2k[**]%b[**]%vn pandora 2433 0.0 0.0 228124 9692 ?    Ss 13:28 0:01 /usr/bin/perl /usr/bin/tentacle_server -a 0.0.0.0 -p 41121 -s /var/spool/pandora/data_in -i.*.conf:conf;.*.md5:md5;.*.zip:collections -d root   6852 0.0 0.0 106112 1224 pts/0  S+ 15:28 0:00 /bin/bash ./pandora_diagnostic.sh root   6860 0.0 0.0 103320 840 pts/0  S+ 15:28 0:00 grep pandora ----------------------------------------------------------------- MySQL Configuration file ----------------------------------------------------------------- ----------------------------------------------------------------- Pandora FMS Server Configuration file ----------------------------------------------------------------- ############################################################################# # Pandora FMS Server Parameters # Pandora FMS, the Flexible Monitoring System. # Version 5.0SP2 # Licensed under GPL license v2, # (c) 2003-2013 Artica Soluciones Tecnologicas # http://www.pandorafms.com # Please change it for your setup needs ############################################################################# # Servername: Name of this server # if not given, it takes hostname. It's preferable to setup one # because machine name could change by some reason. #servername adama # incomingdir: Defines directory where incoming data packets are stored # You could set directory relative to base path or absolute, starting with / incomingdir /var/spool/pandora/data_in # log_file: Main logfile for pandora_server # You could set file relative to base path or absolute, starting with / log_file /var/log/pandora/pandora_server.log # Log file for Pandora FMS SNMP console. Its generated by NetSNMP Trap daemon snmp_logfile /var/log/pandora/pandora_snmptrap.log # Error logfile: aux logfile for pandora_server errors (in Daemon mode) # You could set file relative to base path or absolute, starting with / errorlog_file /var/log/pandora/pandora_server.error # daemon: Runs in daemon mode (background) if 1, if 0 runs in foreground # this could be also configured on commandline with -D option # daemon 1 # dbengine: mysql, postgresql or oracle (mysql by default) dbengine mysql # Database credentials. A VERY important configuration. # This must be the same credentials used by your Pandora FMS Console # but could be different if your console is not running in the same # host than the server. Check your console setup in /include/config.php # dbname: Database name (pandora by default) dbname pandora # dbuser: Database user name (pandora by default) dbuser pandora # dbhost: Database hostname or IP address dbhost localhost # dbport: Database port number # Default value depends on the dbengine (mysql: 3306, postgresql: 5432, oracle: 1521) #dbport 3306 # By default, parent agent will not update #update_parent 0 # verbosity: level of detail on errors/messages (0 default, 1 verbose, 2 debug.... 10 noisy) # -v in command line (verbose) or -d (debug). Set this to 10 when try to locate problems and # set to 0 or 1 on production enviroments. verbosity 0 # Master Server, 1 if master server (normal mode), 0 for slave mode (slave in multi-server setup) master 1 # Activate Pandora SNMP console (depending on snmptrapd) snmpconsole 1 # snmptrapd will ignore authenticationFailure traps if set to 1. snmp_ignore_authfailure 1 # snmptrapd will read the PDU source address instead of the agent-addr field is set to 1. snmp_pdu_address 0 # Activate (1) Pandora Network Server networkserver 1 # Activate (1) Pandora Data Server dataserver 1 # Activate (1) Pandora FMS Recon server reconserver 1 # pluginserver : 1 or 0. Set to 1 to activate plugin server with this setup pluginserver 1 # Pandora FMS Plugin exec tool filepath (by default at /usr/bin) plugin_exec /usr/bin/timeout # predictionserver : 1 or 0. Set to 1 to activate prediction server with this setup # DISABLED BY DEFAULT predictionserver 0 # wmiserver : 1 or 0. Set to 1 to activate WMI server with this setup # DISABLED BY DEFAULT wmiserver 1 # Network timeout (in seconds) for timeout in network connections for Network agents network_timeout 3 # Server keepalive (in seconds) server_keepalive 45 # Server Threshold: defines number of seconds of main loop (in sec) server_threshold 10 # Network threads: Do not set too high (~40). Each threads make a network module check. network_threads 5 # icmp_checks x : defines number of pings for each icmp_proc module type. at least one of # that ping should be 1 to report 1 icmp_checks 1 # tcp specific options : # tcp_checks: number of tcp retries if first attempt fails. # tcp_timeout: specific timeout for tcp connections tcp_checks 1 tcp_timeout 30 # snmp specific options : # snmp_checks: number of snmp request retries if first attempt fails. # snmp_timeout: specific timeout for snmp request. snmp_checks 1 snmp_timeout 5 # snmp_proc_deadresponse 1 (default): Return DOWN if cannot contact # or receive NULL from a SNMP PROC module. snmp_proc_deadresponse 1 # plugin_threads: Specify number of plugin server threads for processing plugin calls plugin_threads 1 # plugin_timeout: Specify number of seconds calling plugin exec waiting for response # after this time, call is aborted and result is "unknown". plugin_timeout 15 # wmi_timeout : specific timeout for wmi request. wmi_timeout 20 # wmi_threads: Specify number of WMI server threads for processing WMI remote calls wmi_threads 2 # recon_threads. Each thread will scan a different scantask. recon_threads 1 # dataserver_threads: Number of threads for data server (XML processing threads) dataserver_threads 1 # mta_address: External Mailer (MTA) IP Address to be used by Pandora FMS internal email capabilities mta_address localhost # mta_port, this is the mail server port (default 25) #mta_port 25 # mta_user MTA User (if needed for auth, FQD or simple user, depending on your server) #mta_user [email protected] # mta_auth MTA Auth system (if needed, it supports LOGIN, PLAIN, CRAM-MD5, DIGEST-MD) #mta_auth LOGIN # mta_from Email address that sends the mail, by default is pandora@localhost #     probably you need to change it to avoid problems with your antispam #mta_from Pandora FMS # Set 1 if want eMail deliver alert in separate mail (default). # Set 0 if want eMail deliver shared mail by all destination. mail_in_separate 1 # xprobe2: Optional package to detect OS types using advanced TCP/IP # fingerprinting tecniques, much more accurates than stadard nmap. # If not provided, nmap is used insted xprobe2 xprobe2 /usr/bin/xprobe2 # nmap: If provided, is used to detect OS type with recon server using # advanded OS fingerprint technique. Xprobe2 gives more accurate results # Nmap is also used to do TCP port scanning in detected host. nmap /usr/bin/nmap # snmpget: Needed to do SNMP checks. By default is on /usr/bin/snmpget snmpget /usr/bin/snmpget # Location of the braa binary needed by the Enterprise SNMP Server (/usr/bin/braa by default) (PANDORA FMS ENTERPRISE ONLY). braa /usr/bin/braa # Number of retries before braa hands a module over to the Network Server (PANDORA FMS ENTERPRISE ONLY). braa_retries 3 # Default group id for new agents created with Pandora FMS Data Server autocreate_group 2 # Set to 1 if want to autocreate agents with Pandora FMS Data Server, # set to 0 to disable autocreate 1 # max_log_size: Specify max size of Pandora FMS server log file (1MB by default). If # log file grows above this limit, is renamed to "pandora_server.log.old". max_log_size 65536 # max_queue_files (500 by default) # When server have more than max_queue_files in incoming directory, skips the read # the directory to avoid filesystem overhead. max_queue_files 500 # Use the XML file last modification time as timestamp. # use_xml_timestamp 1 # Pandora FMS will autorestart itself each XXX seconds, use this if you experience problems with # shutting down threads, or other stability problems. # auto_restart 86400 # Pandora FMS will restart after restart_delay seconds on critical errors. # restart 0 # restart_delay 60 # More information about GIS Setup in /usr/share/pandora_server/util/gis.README # Flag to activate GIS (positional information for agents and maps) # by default it is desactivated activate_gis 0 # Radius of error in meters to consider two gis locations as the same location. #location_error 50 # Recon reverse geolocation mode [disabled, sql, file] # disabled  The recon task doesn't try to geolocate the ip discovered. # sql    The recon task trys to query the SQL database to geolocate the #       ip discovered # file    The recon task trys to find the geolocation information of the #       ip discovered in the file indicated in the #        recon_reverse_geolocation_file parameter # recon_reverse_geolocation_mode disabled # Recon reverse geolocation file. This is the database with the reverse # geolocation information using MaxMind GPL GeoLiteCity.dat format). #recon_reverse_geolocation_file /usr/local/share/GeoIP/GeoIPCity.dat # Radius (in meters) of the circle in where the agents will be place randomly # when finded by a recon task. Center of the circle is guessed # by geolocating the IP. #recon_location_scatter_radius 1000 # Pandora Server self-monitoring (embedded agent) (by default enabled) self_monitoring 1 # Update parent from the agent xml #update_parent 1 # # # This enable realtime reverse geocoding using Google Maps public api. # This requires internet access, and could have performance penalties processing GIS # information due the connetion needed to resolve all GIS input. # NOTE: If you dont pay the service to google, they will ban your IP in a few days. google_maps_description 1 # This enable realtime reverse geocoding using Openstreet Maps public api. # This requires internet access, and could have performance penalties processing GIS # information due the connetion needed to resolve all GIS input. # You can alter the code to use a local (your own) openstreet maps server. # openstreetmaps_description 1 # Enable (1) or disable (0) Pandora FMS Event Web Server (PANDORA FMS ENTERPRISE ONLY). webserver 1 # Number of threads for the Web Server (PANDORA FMS ENTERPRISE ONLY). web_threads 1 # Uncomment to perform web checks with CURL instead of LWP. #web_engine curl # Enable (1) or disable (0) Pandora FMS Inventory Server (PANDORA FMS ENTERPRISE ONLY). inventoryserver 1 # Number of threads for the Web Server (PANDORA FMS ENTERPRISE ONLY). inventory_threads 1 # Enable (1) or disable (0) Pandora FMS Export Server (PANDORA FMS ENTERPRISE ONLY). exportserver 0 # Number of threads for the Export Server (PANDORA FMS ENTERPRISE ONLY). export_threads 1 # Enable (1) or disable (0) Pandora FMS Event Server (PANDORA FMS ENTERPRISE ONLY). eventserver 0 # Event Server event window in seconds (3600 by default) (PANDORA FMS ENTERPRISE ONLY). event_window 3600 # Enable (1) or disable (0) Pandora FMS Enterprise ICMP Server (PANDORA FMS ENTERPRISE ONLY). # You need nmap 5.20 or higher in order to use this ! icmpserver 0 # Number of threads for the Enterprise ICMP Server (PANDORA FMS ENTERPRISE ONLY). icmp_threads 2 # Enable (1) or disable (0) Pandora FMS Enterprise SNMP Server (PANDORA FMS ENTERPRISE ONLY). # Check braa tool is running and operative. snmpserver 0 # Number of threads for the Enterprise SNMP Server (PANDORA FMS ENTERPRISE ONLY). snmp_threads 2 # Block size for block producer/consumer servers, that is, the number of modules # per block (15 by default) (PANDORA FMS ENTERPRISE ONLY). block_size 15 # If set to 1, process XML data files in a stack instead of a queue. 0 by default. # WARNING: Incremental modules will not work properly if dataserver_lifo is set to 1!!! dataserver_lifo 0 # If set to 1, the policy manager is enabled and the server is listening the policy queue. # 0 by default (PANDORA FMS ENTERPRISE ONLY) policy_manager 1 # If set to 1, the event replicate process is enabled. 0 by default. (PANDORA FMS ENTERPRISE ONLY) # WARNING: This process doesn't do anything if is not properly configured from the console setup event_replication 0 # If set to 1, new events validate older event for the same module. This will # affect the performance of the server. This was the "normal behaviour" on previous (4.x) versions. # disable only if you really know what you are doing !!. event_auto_validation 1 # If defined, events generated by Pandora FMS will be written to the specified text file. #event_file /var/log/pandora/pandora_events.txt # Set the maximum number of traps that will be processed from a single source in a # configured time interval. #snmp_storm_protection 10 # Time interval for snmp_storm protection (in seconds). #snmp_storm_timeout 600 # Default texts for some events. The macros _module_ and _data_ are supported. #text_going_down_normal Module '_module_' is going to NORMAL (_data_) #text_going_up_critical Module '_module_' is going to CRITICAL (_data_) #text_going_up_warning Module '_module_' is going to WARNING (_data_) #text_going_down_warning Module '_module_' is going to WARNING (_data_) #text_going_unknown Module '_module_' is going to UNKNOWN # Events older that the specified time (in seconds) will be auto-validated. Set to 0 to disable this feature. event_expiry_time 0 # Only events more recent than the specified time window (in seconds) will be auto-validated. This value must # be greater than event_expiry_time. #event_expiry_window 86400 mta_address 127.0.0.1 mta_port 25 mta_from Pandora FMS ----------------------------------------------------------------- Pandora FMS Logfiles information ----------------------------------------------------------------- total 264 drwxr-xr-x. 2 pandora root 4096 Dec 15 13:28 . drwxr-xr-x. 13 root  root 4096 Dec 15 15:12 .. -rw-r--r--. 1 root  root 1007 Dec 15 13:28 pandora_agent.log -rw-rw-rw- 1 root  root  43 Jul 26 16:17 pandora_alert.log -rw-rw-rw-. 1 root  root 86058 Dec 15 15:01 pandora_server.error -rw-rw-rw- 1 root  root 10812 Nov 4 03:13 pandora_server.error-20161104.gz -rw-rw-rw- 1 root  root 10733 Nov 24 03:17 pandora_server.error-20161124.gz -rw-rw-rw- 1 root  root 11294 Dec 12 04:14 pandora_server.error-20161212.gz -rw-rw-rw- 1 root  root 40159 Apr 13 2016 pandora_server.log -rw-rw-rw- 1 root  root 65576 May 29 2014 pandora_server.log.old -rw-rw-rw- 1 root  root  21 Dec 15 13:28 pandora_snmptrap.log -rw-rw-rw- 1 root  root  4 Dec 15 13:28 pandora_snmptrap.log.index ----------------------------------------------------------------- System disk ----------------------------------------------------------------- Filesystem      Size Used Avail Use% Mounted on /dev/mapper/vg_pandorafms-lv_root            50G 4.3G 45G 9% / tmpfs        6.7G  0 6.7G 0% /dev/shm /dev/sda1      477M 173M 279M 39% /boot /dev/mapper/vg_pandorafms-lv_home            11G 462M 9.3G 5% /home ----------------------------------------------------------------- Vmstat (5 execs) ----------------------------------------------------------------- procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so  bi  bo in cs us sy id wa st 0 0   0 13056788 148744 160592  0  0  13  97 685 185 5 13 80 2 0 0 0   0 13056556 148744 160628  0  0  0  96 395 244 2 5 92 2 0 0 0   0 13056556 148744 160628  0  0  0 1560 157 167 1 1 92 7 0 1 0   0 13056400 148760 160612  0  0  0 200 6450 1000 11 49 37 3 0 2 1   0 13056540 148852 160556  0  0  0 400 8832 1290 16 59 23 3 0 ----------------------------------------------------------------- System dmesg ----------------------------------------------------------------- DMESG OMITTED
-
::
Hi Murigar
CPU utilisation of above 100% shown in response to TOP command is normal. This %age is from the total available processors in your server. So if you have 8 processors, it means you have 800% available. So even if your CPU util goes 200-400%, I think it’s normal.
What issue are you facing specifically which makes you think that it’s a CPU issue?
Regards
-
::
Hello Murigar,
Have you seen if there is some big amount of pandorafms_server threads or processes open by the time your server reach high CPU ratios?
Have you made any change on the configuration like increasing the number of server threads (dataserver_threads, networkserver_threads…) ?Also, keep in mind what Murigar explains accurately. Maybe even a different version of top program may display the results and CPU utilization in different ways.
Kind regards,
Antonio. -
::
ayad99 – The reason I believe this is acting abnormally is. The “Lag” as listed in “Pandora Servers” is generally 0 – 5 seconds. With 0 – 10 items queued.
Generally CPU as listed in VMware is about 10% utilized.When this runaway occurs, the “Lag” spikes up to 20 minutes with hundreds of items in the queue.
CPU as listed in VMware will sit at 100% utilized.
With this huge “Lag” things will start triggering causing false alerts.It will sit this way for hours requiring a reboot to clear right up.
antonio.s –
During normal operation there is the one process and 19 threads. I believe this does not increase when acting up.