This type of monitoring can find several problems, from network incidents to processing problems, I attach a small list of possible checks to correct the status of these modules.
Main network problems:
- The equipment to which the module is being launched is off or has no connection to the network.
- There is a cut in the network segment, so the Pandora FMS server or Satellite server will not be able to do the check.
- An intermediate element is blocking the connection, sometimes some elements like firewall, proxies, etc… block the requests and cause this wrong behavior.
Main processing problems:
- The module interval is too low, in this case it is possible that there are queues of module executions so we recommend not to have intervals lower than 1 minute.
- General server queuing problems, if there are many ICMP modules and there are modules in unknown state it is possible that your Pandora FMS server does not have enough threads to process all the modules, so we recommend to increase a little the threads (without exceeding 50 threads between all the internal Pandora FMS servers).
- If there is a time problem, that is, our checks take X seconds to respond to ping/fping, you can establish a timeout for these modules in the Pandora FMS server configuration file.
network_timeout 3
If after the 3 seconds defined in the previous parameter the check is not performed correctly, the execution is cut off and the module status will be wrong, take into account that if you set a very high timeout the server could have performance problems (depending on the speed of each check and the total amount of ICMP modules).