Welcome to Pandora FMS Community!

Find answers, ask questions, and connect with our community around the world.

Welcome to Pandora FMS Community Forums Community support strange results for GENERIC_DATA_INC do to inexact sleep

  • strange results for GENERIC_DATA_INC do to inexact sleep

    Posted by daggett on January 12, 2007 at 15:38

    hi,
    After some weeks testing, I regularely get incoherent data for GENERIC_DATA_INC data:
    cpu _user or cpu_sys modules are sending the number of ticks parsing /proc/stat .
    this number is an ever incrmenting value (I don’t know what will happen if the counter resets though: how is it handled in Pandora?).
    Sending only the actual_val/300 as an INC value, and let Pandora Server do the substraction, it gives oftenly data like 121%.

    This is due to the use of sleep command in the Pandora Agent script to time execution: when there’s a program consumming all the CPU on the monitored computer:
    – the sleep command (e.g. sleep 300 seconds), will have a duration of more than 300seconds,
    – the script becomes really long to execute, so Agent_execution_time can be above 300 seconds (sometimes near 360=6minutes instead of a few seconds)

    So the program still counts on 300 seconds (5minutes) and divids by 300… In fact the whole cycle (execute and wait) can take more than 600seconds (10minutes), this is twice the expected time, so I could have had some near-200% values for cpu_sys!!

    So every time the script is executed, it’s sliding a bit, sometimes making the server think the data is missing.

    so maybe we can do something else than use sleep? like cron, so it will be far more precise and optimize the code?

    As a great majority of GENERIC_DATA_INC incremental data is strongly time-dependent, this situation makes them useless and meaningless.

    bye for now!

    daggett replied 18 years, 1 month ago 2 Members · 2 Replies
  • 2 Replies
  • Sancho

    Administrator
    January 12, 2007 at 23:55
    2309 Karma points
    Community awards: bulb Bright ideas
    Community rank: tentacle_master_icon Tentacle Master
    Like it
    Up
    0
    Down
    Drop it
    ::

    hi,

    – the sleep command (e.g. sleep 300 seconds), will have a duration of more than 300seconds,
    – the script becomes really long to execute, so Agent_execution_time can be above 300 seconds (sometimes near 360=6minutes instead of a few seconds)

    So the program still counts on 300 seconds (5minutes) and divids by 300… In fact the whole cycle (execute and wait) can take more than 600seconds (10minutes), this is twice the expected time, so I could have had some near-200% values for cpu_sys!!

    So every time the script is executed, it’s sliding a bit, sometimes making the server think the data is missing.

    so maybe we can do something else than use sleep? like cron, so it will be far more precise and optimize the code?

    As a great majority of GENERIC_DATA_INC incremental data is strongly time-dependent, this situation makes them useless and meaningless.

    bye for now!

    You’re right: in slow systems or systems with a high load, and using this with agents (not for network server) you can get some imprecissions.
    For next version we will use agent contact time (local), who has much more precission.

    Thanks for your observation, as usual you’re testing Pandora in depth 🙂

  • daggett

    Member
    January 15, 2007 at 13:43
    0 Karma points
    Community rank: tentacle-noob-1 Tentacle noob
    Like it
    Up
    0
    Down
    Drop it
    ::

    ok thanks,
    but the problem will remain until we can put the exact time each data_inc was collected and put this time in the CDATA of the module or include at what exact time each module was executed in the XML data files.

    I also saw that when processing old data, the alarms can be triggered, but the displayed date/time are the date/time when the server processed the XML data:
    in my case, the XML data files can be processed several days later in the case of a connexion failure, so I get alarms triggered when processing those old data, and the dat/time displayed for these alarms is the data/time of the processing, not the date/time of the collected data.

    bye for now!