Pandora: Documentation en: Services

From Pandora FMS Wiki
Revision as of 09:42, 15 October 2020 by Laura.cano (talk | contribs) (Creating a New Service)
Jump to: navigation, search

Go back to Pandora FMS documentation index

1 Service Monitoring

1.1 Introduction

A service is a way to group IT resources based on their features.

A service could be an official website, a CRM system, a support application, or even printers. Services are logical groups which can include hosts, routers, switches, firewalls, CRMs, ERPs, websites and of course, different other services.

In Pandora FMS, services are represented as a group of monitored elements (modules, agents or other services) whose individual status affects in a certain way the global performance of the service provided.

1.2 Services under Pandora FMS

1.2.1 How Services work under Pandora FMS

Basic monitoring in Pandora FMS consists of collecting metrics from different sources, representing them as monitors (modules).

Service monitoring allows to group these modules, so that, by playing with certain ranges based on failure build-up, groups of different types of elements and their relationship in a larger and general service can be monitored.

In short, service monitoring allows to check the status of a global service. You will be able to know if our service is being provided normally (green), degraded (yellow) or if it is not being provided altogether (red).

This is an example to better understand what service monitoring is all about.

Suppose you want to monitor a web application, which you have balanced through a series of redundant elements. The infrastructure on which the application is based on could consist of the following elements:

  • Two routers in HA.
  • Two switches in HA.
  • 20 Apache Web Servers.
  • Four Weblogic Appliance Servers.
  • One MySQL Cluster consisting of two Storage and two SQL Processing Nodes.

The goal is to find out whether the web application works properly. That is the final assessment by customers, whether the application works or not.

The need to monitor services as something "abstract" arises when faced with the following question:

What happens to an application if a non-critical element fails?

For example, if one of the twenty Apache servers were to fail, in theory it could not be warned, because the purpose of so much redundancy is having troublesome situations covered. But then, which one should be warned about?, all of them or some? What is the rule for warning?

You might think that Pandora FMS should only warn you if a highly critical element fails (for example a router) or if several Apache servers fail.

Monitoring through services in Pandora FMSfeature is here to solve all these doubts.

The services in Pandora FMS help you to:

  • Limit the number of received alerts. You will receive alerts about situations that compromise the reliability of the services you provide.
  • Track the compliance level.
  • Simplify the monitoring display of your infrastructure.

To achieve this, monitor every element that could negatively affect your application.

Through Pandora FMS console, define a service tree in which to indicate both the elements that affect your application, as well as their impact degree.

All elements added to the service trees will correspond to information that is already being monitored, either in the form of modules, specific agents or other services.

To indicate the degree to which the status of each element affects the overall status, a weight sum system will be used, so that the most important ones (with more weight) will be more relevant to adjust the overall status of the whole service to an incorrect status before less important elements (with less weight).

Practical example:

  • Switches and routers: 5 points each when in critical, and 3 points if in warning.
  • WEB servers: 1.2 points for each one in critical, warning status is disregarded.
  • WebLogic Servers: 2 points each in critical.
  • MySQL Cluster: 5 points for each node in critical and 3 points in warning.

Element type Weight assignment
Normal Warning Critical Unknown
Web server001.21.2
Weblogic server0022
MySQL server0355

A warning threshold of 4 for service, and a critical threshold of 6 are set. That way, and assuming there are no issues, the service would be "OK" if all the monitored elements are OK or not important enough to cause deficiencies when providing the service.

Service configuration
Normal Warning Critical
0 >=4 >=6

But suppose that one (1) Apache web server fails:

  • 1 x Apache server in CRITICAL x 1.2 points = 1.2 because 1.2 < 4 (Warning), the service is still in OK status.

The weight contribution will be:

2 x 0 (routers OK)
+ 2 x 0 (switches OK)
+ 19 x 0 (apache OK)
+ 1 x 1.2 (apache CRIT)
+ 4 x 0 (weblogic OK)
+ 1 x 0 (mysql OK)
Total: 1.2 --> The service will be NORMAL

Now this is what happens if a WEB server and a Weblogic fail:

  • 1 x Apache Server in CRITICAL x 1.2 pto = 1.2
  • 1 x Weblogic Server in CRITICAL x 2 = 2

Total, 3,2 is still < 4, so the server remains in OK status, it is still working, it is not necessary to take technical action immediately.

The weight contribution will be:

2 x 0 (routers OK)
+ 2 x 0 (switches OK)
+ 19 x 0 (apache OK)
+ 1 x 1.2 (apache CRIT)
+ 3 x 0 (weblogic OK)
+ 1 x 2 (weblogic CRIT)
+ 1 x 0 (mysql OK)
Total: 3.2 --> The service will be NORMAL

This is what happens if two WEB servers and a WEblogic fail:

  • 2 x Apache Server in CRITICAL x 1.2 points = 2.4
  • 1 x Weblogic Server in CRITICAL x 2 = 2

The total is 4.4, now it is > 4 and the service goes into WARNING status, the service has gone into a degraded status. It continues to work, and it may not require immediate technical action, but it is clear that there has been a problem with your infrastructure.

2 x 0 (routers in OK)
+ 2 x 0 (switches in OK)
+ 18 x 0 (apache OK)
+ 2 x 1.2 (apache CRIT)
+ 3 x 0 (weblogic OK)
+ 1 x 2 (weblogic CRIT)
+ 1 x 0 (mysql OK)
Total: 4.4 --> The service will be in WARNING

Suppose that in addition to the above, a Router fails:

  • 2 x Apache Server in CRITICAL x 1.2 points = 2.4
  • 1 x Weblogic Server in CRITICAL x 2 = 2
  • 1 x Router in CRITICAL x 5 = 5

Now 9.4 is above the threshold set at 6 for CRITICAL, so, the service goes into critical, the service is not working and immediate technical action is required.

1 x 0 (routers OK)
+ 1 x 5 (router CRIT)
+ 2 x 0 (switches OK)
+ 18 x 0 (apache OK)
+ 2 x 1.2 (apache CRIT)
+ 3 x 0 (weblogic OK)
+ 1 x 2 (weblogic CRIT)
+ 1 x 0 (mysql OK)
Total: 9.4 --> The service is in CRITICAL

Pandora FMS will alert the corresponding work team (operators, technicians, etc.).

Service monitoring is a feature only available in Pandora FMS Enterprise version. How simple mode works

The weight system might be too complex when monitoring needs are quite basic. That is why a new simple mode is available on the service configuration.

Within this mode, it is only required to point out which elements are critical and which ones are not.

Only the elements checked as critical will be taken into account to carry out the calculations and only the critical element status will have actual value.

  • When a percentage between 0 and 50% of the critical elements is in critical status the service will go into warning status.
  • When more than 50% of the critical elements go into critical, the service will go into critical status.

Follow this example of a simple service:

  • Router as critical element.
  • Printer as non critical element.
  • Apache server as critical element.

At some point, the elements report this status:

  • Router on critical.
  • Printer on critical.
  • Apache server on warning.

The service status is warning, because the printer is not a critical element and its status is not taken into account, as well as the Apache service status, which, even though it is a critical element, will only be taken into account in critical status. In this situation, one critical element is on critical status, 50% of the critical elements.

Other situation might be the elements reporting these status:

  • Router on critical.
  • Printer on critical.
  • Apache server on critical.

The service status is normal, since over 50% of the critical elements are on critical status.

Finally, the elements might report these status:

  • Router on normal.
  • Printer on critical.
  • Apache on normal.

The service status is normal, since less than 50% of the critical elements are on critical status. In fact, no key elements are in critical status, only the printer, which by not being a critical element, is not taken into account in calculations. Root services

From Pandora FMS OUM726 version 7.0 onwards, services are evaluated slightly differently.

From now on, services that are not part of another service will be evaluated, which are known as root services. This logical change allows to speed monitoring, minimizing work queues.

In addition and based on that, when a service defined in a Pandora FMS node appears as a Metaconsole root service element, the Metaconsole server will be the one to evaluate it, updating the values stored in the node.

This provides a more efficient distributed logic, and allows to apply a cascade protection system based on services. This point is fully discussed in [Cascade service protection].

Metaconsole service possibilities have also been extended, allowing to add other services, modules or agents as service elements. In previous versions, only node services could be added.

1.2.2 Creating a New Service Introduction

Template warning.png

The Enterprise version is required and the PredictionServer component must be enabled to be able to use these services.


The services may represent:

  • Modules
  • Full agents
  • Other Services

Service values are calculated using the Prediction Server.

Once you have all the devices monitored. Add within each service all the modules, agents or sub-services that you need to monitor the service. For example, if you want to monitor the Online Store service, you need a module for content, a service that monitors the state of communications and so on. The following steps describe how to create a service with Pandora FMS.

To create a new service, click on Services at the Topology Maps menu.

Menu services.png

A tree view containing all the available services will be shown.

Arbol servicios.png Initial Configuration

To create a new service, click on the 'Create' button and fill out the form.

Formulario servicios.png

The names of the form fields are:

  • Name: Name of the service.
  • Description: Service description, a long mandatory text. Said description will appear in the service map, the service table view and the service widget (instead of the name).
  • Group: Group to which the service belongs, useful to organize it and to apply ACL restrictions.
  • Agent to save the data: The service saves the data in some special data modules (in particular the prediction modules) and it is necessary to add an agent to be the container of said modules and the alarms that later you will have to configure in this same form. Note: Please bear in mind that the interval in which all the calculations of the service modules will be done will depend on the agent interval configured as container.
  • Mode: Mode in which the element weights will be calculated. It may have 2 values:
    • Smart: The service's weights and elements will be calculated automatically based on established rules.
    • Manual: The service's weights and elements will be indicated manually with fixed values.

Template warning.png

The smart mode is only available from Pandora FMS version 7.0NG 748.

The automatic and simple modes of previous versions will become manual by applying the MR 40 in the version update.


  • Critical: Weight threshold to declare the service as critical. In smart mode this value will be a percentage. We will explain later how the elements contribute to this value.
  • Warning: Weight threshold to declare the service as in warning status. In smart mode, this value will be a percentage. We will explain later how the elements contribute to this value.
  • Unknown elements as critical: It allows you to indicate that elements in an unknown state contribute their weight as if they were a critical element.
  • Favorite: It allows to check the service as a favorite. This will create a direct link in the side menu and you can filter the services in the views based on this criteria.

Servicios favoritos.png

  • Silent mode: It activates the silence mode of the service, so it will not generate alerts or events.
  • Cascade protection enabled: It activates cascade protection over the service elements. These will not generate alerts or events if they belong to a service (or sub-service) that is in a critical state.
  • Calculate continuous SLA: It activates the creation of SLA and SLA value modules for the current service. If disabled, the dynamically calculated SLA information will not be available, nor will the alerts on SLA compliance for this service. It is used for cases where the number of services required is so high that it can affect performance. If this option is disabled, once the service has been created, the data history of these modules will be deleted, so information will be lost.
  • SLA Interval: Time period to calculate the effective SLA of the service.
  • SLA Limit: Service status threshold in OK to be considered a positive SLA during the period of time you have configured in the previous field.
  • Service alert in warning status: Alert template that the service will have to launch the alert when the service goes into warning status.
  • Alert of the service in critical state: Alert template that the service will have to launch the alert when the service goes into critical status.
  • Alert of the service in unknown state: Alert template that the service will have to launch the alert when the service goes into unknown status.
  • SLA critical alert: Alert template that the service will have to launch the alert when the service SLA is not met. Element Configuration

Once the form has been correctly filled in, it will have an empty service which must be filled in with elements as we will see below. In the service edition form, select the 'Configure elements' tab.

Elementos servicios.png

By clicking on Add element, a pop-up window with a form will appear. The form will be slightly different if the service is in smart mode or in manual mode.

Formulario elementos servicios.png

The form fields are:

  • Description: Optional text that will be used to represent the element on the service map. If not indicated, the name of the module, agent or service (depending on the added element) will be used.
  • Type: Drop-down list to choose whether the element will be a service, module or agent. In smart mode services you can also choose the dynamic type.
  • Agent: Intelligent agent search engine. Only visible if the element to create or edit is an agent or module type.
  • Module: Deployable list with the modules of the agent previously chosen in the intelligent search engine. This control is only visible if an element for the module type service is edited or created.
  • Service: Dropdown list of the services to create an element. Only visible if the element to be created or edited is a service element. It should also be noted that the services that will appear in the drop-down list are those that are not the ancestors of the service. This is necessary to show a correct tree structure of dependency between services.

The following fields will only be available for dynamic elements, in services in smart mode:

  • Type of elements that match: Drop-down list to choose whether the elements for which the dynamic rules will be evaluated and that will be part of the service will be agents or modules.
  • Filter by group: Rule to indicate the group the element must belong to to be part of the service.
  • With agent name: Rule to indicate the name of the agent that must have the element to be part of the service. A text will be indicated that must be part of the name of the desired agent.
  • With module name: Rule to indicate the module name that must have the element to be part of the service. A text that must be part of the desired module name will be indicated.
  • With custom field name: Rule to indicate the name of the custom field that must have the element to be part of the service. A text that must be part of the name of the desired custom field will be indicated.
  • With custom field value: Rule to indicate the value of the custom field that the element must have to be part of the service. A text that must be part of the desired custom field value will be indicated.

For example, if the configuration of a dynamic element was:

Elementos dinamicos servicios.png

All the modules that in its name include "Host Alive", in an agent whose name includes "SW", inside the "Servers" group, with a customized field whose name include "Department" with a value including "Systems", would be used as service elements.

The following fields will only be available for services in manual mode:

  • Critical: Weight that the element will add to the service when in critical state.
  • Warning: Weight that the element will add to the service when in warning state.
  • Unknown: Weight that the element will add to the service when in unknown state.
  • Normal: Weight that the element will add to the service when in normal state.

To calculate the status of a service, the weight of each of its elements will be added based on its status, and if it exceeds the thresholds established in the service for warning or critical, the status of the service will change to warning or critical accordingly.

In smart mode services, since no weights are defined for the elements, the way their status is calculated is as follows:

  • Critical elements contribute their full percentage to the weight of the service. This means that if, for example, there are 4 elements in the service and only 1 of them is critical, that element will add 25% to the weight of the service. If instead of 4 elements there were 5, the critical element would add 20% to the weight of the service.
  • Warning elements contribute half of their percentage to the weight of the service. This means that if for example a service has 4 elements and only 1 of them is in warning status, that element will add 12.5% to the weight of the service. If instead of 4 elements there were 5, the warning element would add 10% to the weight of the service.

Template warning.png

Dynamic elements are not affected by service cascade protection. Modules created when configuring a service

  • SLA Value Service: The percentage value of SLA compliance. (async_data).
  • Service_SLA_Service: This shows whether the SLA is met or not. (async_proc).
  • Service_Service: This module shows the sum of the service weights. (async_data).

1.2.3 Service Visualization Simple all-service view

It is the operation list that shows all created services. Of course, it only shows those groups that the user that is using the Pandora FMS console has access to.

To get to this view, go to the Operation menu, open the Monitoring entry and there is the Services section.

Services list services admin v5.png

Each row represents a service, and the columns represent:

  • Name: The name of the service.
  • Description: The service's short description.
  • Group: The icon of the group the service belongs to.
  • Critical: The threshold value for weight sums to get the service into 'critical' status.
  • Warning: The threshold value for weight sums to get the service into 'warning' status.
  • Value: The current value for weight sums for the service.
  • Status: An icon that represents the status of the service. Four possible status are represented:
    • Red: The service is in 'critical' status because the value exceeded the critical threshold.
    • Yellow: The service is in 'warning' status because the value equaled or exceeded the critical threshold.
    • Green: The service is within the 'normal' range because weight sum does not reach the threshold.
    • Gray: The service is in 'unknown' status. This usually means the service has been recently created and does not contain any modules or the Pandora FMS Prediction server is down.
  • SLA: The current value of the SLA Service. The values can be:
    • OK: The SLA is met for the interval defined in the SLA service.
    • INCORRECT: The SLA is not met for the interval currently defined in the SLA Service.
    • N/A: The SLA is in 'unknown' status because there is not enough data to perform the calculation. Table including all services

A table for quick display including all visible services and their current status.

Servs.JPG Simple list of a service and its elements

This view is accessible by clicking on the name of a service in the list of all services, or through the magnifying glass icon tab in the service title header.

Pandora FMS will show a page similar to the one shown in the following screenshot:

Services list elements operation v5.png

In the screenshot, two sections can be distinguished, the service with the same columns as in the previous view at the top. And the list of the elements that make up this service at the bottom.

The list of elements appears in table format, where the rows correspond to each element and the columns represent:

  • Type: The icon which represents the element type. It is a building block for modules or some stacked blocks for an agent and a Network Diagram Icon for the services.
  • Name: The text which contains the name of the module, agent or service. They are also linked to the corresponding section.
  • Description: Short description.
  • Weight critical: The value if the element is in 'critical' status.
  • Weight warning: The value if the element is in 'warning' status.
  • Weight normal: The value if the element is in 'normal' status.
  • Data: The value of the element. It can adopt the following modes:
    • Module: The value of the module.
    • Agents: The text that displays the agent's status.
    • Service: The weight sum of the elements of the service that has been chosen as the element for the parent service.
  • Status: The icon which represents the element's status by color.

Template warning.png

Keep in mind that service-element calculation is performed by Prediction Server. What you look at is not real-time data. There are some situations in which a module's agent is added to the service where its weight will not be updated until calculation is performed by the Prediction Server again. Service map view

This view will display the service in arborescent form as you can see in the following screenshot. That way, it is possible to quickly see how modules, agents or sub-services influence service monitoring. Even in sub-services you can see what influences them when calculating the status by summing weights.

Services servicemap v5.png

The possible nodes can be:

  • Module Node: It is represented by the 'heartbeat' icon. This node is always final (leaf).
  • Agent Node: It is represented by the 'CPU box' icon. This module is always final too (leaf).
  • Service Node: It is represented by the 'crossed hammer and wrench' icon. This module is not a final node. It is required to contain additional nodes.

The node's colors and the arrow which connects them to the service depend on the node's status, as always green OK, red critical, yellow warning or grey in unknown state.

There are the following attributes within the node:

  • Title: The name of the service's / agent's or module's node, accompanied by the agent.
  • Value list:
    • Critical:: The total weight it reaches in 'critical' status, except if it is the root-service node, which represents a threshold to reach the 'critical' status.
    • Warning: The weight if it reaches 'warning' status, except if it is the root-service node, which represents the threshold to reach the 'warning' status.
    • Normal: The weight if it reaches 'normal' status, except if it is the root-service node, in which case nothing will be displayed here.
    • Unknown: The 'unknown' status, except if it is the root-service node, which represents a threshold to reach the 'unknown' status.

You may click on each node in the tree. The target link represents the operational view of the node itself.


When the service mode is simple, a red exclamation mark appears on the right side of the critical elements. Services within the Visual Console

From Pandora FMS versions 5 onwards, you may add services in the Visual Console like any other item on the map.


To create a service item on a map, the process is the same as for all other visual map items, but the options palette will be the same as in the screenshot.


It contains the following attributes:

  • Label: The title shown within the visual console's node.
  • Service: Drop-down list that shows the services it has access to, to add to the map.

Note that a service item, unlike other items in the visual map, cannot be linked to other visual maps, and always the clickable link in the visual console is intended for the tree service map view described above. Service tree view

This view allows you to view services in the form of a tree.

Each level shows the total number of elements included in each service or agent.

  • Services: It reports the total number of services, agents and modules that belong to that service.
  • Agents: It reports the number of modules in critical state (red color), warning (yellow color), unknown (gray color), uninitiated (blue color) and normal state (green color).

Services that do not belong to another one will always be shown on the first level. In the case of a child service, it will be shown nested inside its parent.

Services treeview.png

Template warning.png

ACL permission restriction is only applied to the first level.


1.2.4 How to read service values

Planned shutdowns added before the stop date allow recalculating the value of SLA reports, given that it allows "backwards" recalculating with scheduled shutdowns added afterwards (that option is globally activated in the general setup). When it is an SLA service report, if there is a scheduled shutdown that affects one or several service elements, the scheduled shutdown is considered to affect the service as a whole, since the shutdown impact on the whole service cannot be measured.

It is worth highlighting that this is at a report level. Therefore, service trees, and the information presented in the visual console are not altered based on planned shutdowns added after the intended execution date. These service compliance percentages are calculated in real time, based on the history data of the same service, they do not have anything to do with the actual report.

On the other hand, it is important to know how the compliance percentage of a service is calculated:

Suppose there is a service defined by a 95% compliance in a 1-hour interval. A table of values, where t is time, x is the compliance % (SLAs), and s is whether complies or not (1 it complies, 0 it fails), will be used. In 1 hour there should be exactly 12 samples (assuming the interval is 5 minutes long).

In a similar case, where the service complies for the first 11 samples (first 55 minutes) and it fails in the 60th minute these would be the values:

   t    |   s   |    x  
1          1      100
2          1      100
3          1      100
4          1      100
5          1      100
6          1      100
7          1      100
8          1      100
9          1      100
10         1      100
11         1      100
12         0      91,6

This case is easier to calculate. The % is calculated depending on the number of samples, for example in t3, there are a total of three samples that meet service, a 100%, whereas in t12, there are 12 samples and 11 are valid samples: 11 / 12.

Suppose you are in the middle of the series, and it is recovering slowly:

   t    |   s   |    x  
1          1      100
2          1      100
3          1      100
4          1      100
5          1      100
6          0      83,3
7          1      85,7
8          1      87,5
9          1      88,8
10         1      90 
11         1      90,9
12         1      91,6

So far all seems similar to the previous scenario, but see what happens if you go over time:

   t    |   s   |    x  
13        1      91,6
14        1      91,6
15        1      91,6
16        1      91,6
17        1      91,6
18        1      100
19        1      100

Now there is unintuitive behavior, because the volume of valid samples remains 11 for a time window that goes up to t18, where the only invalid value is out of the window, so in t18 compliance becomes 100%. This step between 91.6 and 100 is explained by the size of the window. The larger the window is (usually SLA calculation interval is daily, weekly or monthly), the less abrupt the step will be.

Weight calculation on simple mode

Weights are dealt with slightly differently on simple mode, since there is only the critical weight and the possibility of going into two more status apart from the normal one. Each element receives weight 1 on critical and 0 on other status, and each time there is a change in service elements, service weights are calculated again. The warning weight can be overlooked. It always has value 0.5 because if it is 0, the service will always be on warning at least, but warning weight is not used in simple mode. The critical weight is calculated so that it is half of the element critical weights summed, which is 1. If there are 3 elements, the service critical weight is 1.5 and then, it is the server the one in charge or checking whether the critical weight has been exceeded or matched to render the service into critical or warning status.

1.2.5 Service cascade protection

From OUM725 update on, there is the possibility of muting service elements in a dynamic way.

This allows to avoid an alert overload for each element that belongs to a certain service or sub-services.

When the 'service cascade protection' feature is active, the action linked to the template configured for the root service will be executed. It will report which the elements have an incorrect status within the service.

It is important to take into account that this system allows the alerts of the elements within the service to be triggered when they go to critical status, even if the general service status is correct.

Service cascade protection will indicate us which elements have failed regardless of the depth of the defined service.


In the example above we see that we have one of the elements of our service in critical status. Even if the main service is correct, it will warn us of the critical state of the elements within, triggering the alert related with the element in critical status.

1.2.6 Root cause analysis

You may have an endless number of sub-services (paths) within a service. In previous versions, Pandora FMS alerted indicating the service status (normal, critical, warning, etc.). From OUM725 on, there is a new macro available that will show the service status root cause.

To use it, add the following text to the template linked to the service:

Alert body: Example message
The series of events that have caused the service status is the following one:

This will return an output similar to this one:

Alert body: Example message
The series of events that have caused the service status is the following one:
[Web Application -> HW -> Apache server 3]
[Web Application -> HW -> Apache server 4]
[Web Application -> HW -> Apache server 10]
[Web Application -> DB Instances -> MySQL_base_1]
[Web Application -> DB Instances -> MySQL_base_5]
[Web Application -> Balanceadores ->]

By seeing this output, it is supposed that:

  • Apache servers 3,4 and 10 are in critical status
  • MySQL_base databases 1 and 5 are down
  • The balancer does not respond

This added information allows to find out the reason behind the service status, reducing failure cause research tasks.

1.2.7 Service grouping

Services are logical groupings that make up an organization's business structure. That is why service grouping may make sense, since they depend on each other in many cases, creating for example a whole service (the business company) or more specific services (corporate web, communications, etc.). To group services, both the general and more particular services must be created, and the last ones must be added to the first one to create the logical tree-shaped structure.

This groups may help you to: create visual maps, configure alerts, apply monitoring policies, etc. Therefore, it is possible to create alerts that warn you when the business goes into critical status because sales representatives cannot do their job, or any branch is not working full capacity due to technichal problems with the ERP service.

To understand more clearly what service grouping is, take a look at these examples.

1.2.8 Service monitoring examples Pandora FMS service

This is an example where the Pandora FMS monitoring service status, made up by the Apache, MysSQL services and Pandora FMS server and Tentacle, is monitored. At the same time, each and everyone of these elements is a service containing different components creating, by service grouping, a tree-shaped structure.


In this case, the general Pandora FMS service will go into critical status when reaching weight 2 and when warning reaches weight 1. As seen, the four components have different weights on Pandora FMS service:

  • MySQL: It is essential for Pandora FMS service. Individual weight of 2 if MySQL is down. It will get a weight of 1 if it is in warning status, showing a warning in Pandora FMS service.
  • Pandora Server: It is essential for Pandora FMS service. Individual weight of 2 if the Pandora FMS Server is down. Individual weight of 1 if it is in warning status, for example, due to CPU overload, scaling the warning until reaching Pandora FMS general service.
  • Apache: It implies a degrading of Pandora FMS service, but not a total interruption, so it gets an individual weight of 1 if it is down, showing the warning status in Pandora FMS service.
  • Tentacle: It entails a degrading, and certain components may fail, but it does not Mean Pandora FMS stops working completely, so its individual weight in case of failure is 1, showing a warning in the general service.

The following image depicts weight configuration of the elements of Pandora FMS service general status:

Pesos.JPG Cluster storing service, service grouping

Services are logical groups that make up part of the business structure of an organization. Therefore, service grouping is reasonable since sometimes some services on their own do not have a complete meaning. To group services, they just need to be added to a greater service as elements, creating a new logical group.

In the following example, there is an HA storing cluster. This time, a system of two fileservers working at the same time has been chosen, each one controlling the percentage and the status of a series of hard drives that provide service to particular departments, creating a group service tree-shaped structure.


According to this structure, the critical threshold of the company's storing service is reached when both fileservers fail, since that would turn down the service, while just one of them failing would entail a service downgrading. The following image contains weight configuration granted to two storing service main elements:


This image shows the content and weight configuration of the FS01 grouped service. Here the elements have a specific weight according to their severity:

  • FS01 ALIVE: Critical for the FS01 service, since it is the virtual IP allocated to the first hard drive cluster. Individual weight of 2, since if it is down, the rest of the service elements will not work. There is no warning threshold, since it is data that depends on the status Yes/No.
  • DHCPserver ping: critical for the FS01 service. It has an individual weight of 2. In this case, there is no warning threshold either.
  • Hard drives: They have an individual weight of 1 in case they reach their critical threshold, and 0.5 for their warning threshold, so this will only affect critically the FS01 service if there are at least two in critical status or the four hard drives in warning status.


1.3 Pandora Server

It is mandatory that the Prediction Service runs properly and also to have the Enterprise Version of Pandora FMS installed.

Go back to Pandora FMS documentation index