Pandora: Documentation en: Introduction
- 1 Introduction
- 2 Introduction to Monitoring
1.1 Pandora FMS: but what is it, exactly?
Pandora FMS is a monitoring software intended for all types of environments. To use the word ‘monitoring’, in its broad semantic sense, is somehow risky - as there are hundreds of tools available- each one of them adapted to a singular type of environment: It is not the same to monitor a couple of printers in a small office than thousands of interfaces and switches in extremely high network traffic at a data center with thousands of servers .
Pandora FMS is designed to adapt to every roll and to every organization. Its main aim is to be flexible enough to manage and control, the complete infrastructure, without a further need to invest any more time or money in any other monitoring tools.
FMS is an acronym for Flexible Monitoring System. Its purpose is to be able to monitor both complex new generation tools and systems and outdated elements, of difficult access and scarce compatibility, in the one platform.
Pandora FMS currently avails of agents for every ‘modern’ operating system on the market, that is: from Windows NT4, to Windows 2008. Not forgetting all the modern Unices (AIX, Solaris, HPUX, BSD, Linux) in every version and in all distributions.
Pandora FMS can, of course, be used successfully not only as a systems monitoring tool, but with all sort of network devices, be it by using SNMP( versions 1,2,3) or via TCP protocol probes ?(snmp, ftp, dns, http, https, etc), ICMP o UDP
1.2 About the documentation
All of this power and flexibility comes with an implicit difficulty at setup stages. In spite of Pandora’s mostly graphic configuration, we are aware that to learn how to use it seems a little too complicated at first. That is why we have divided the 800 pages of the user’s guide in several chapters:
- Chapter I. Understanding Pandora FMS.
- Chapter II. Installation and Configuration.
- Chapter III. Monitoring with Pandora FMS.
- Chapter IV. Operating and Managing Pandora FMS.
- Chapter V. Complex Environments and Best Performance.
- Chapter VI & VII. References and Technical Appendices.
Besides the official documentation, you can avail of our user’s forum at: http://openideas.info/smf were you can post queries in English, Spanish and Japanese to other users. Were you to require official training there is an official training program  imparted by the developers of Pandora.
We have compiled some quick reference guides guías rápidas to assist you to configure Pandora FMS and to implement simple monitoring tasks with Pandora’s tool. You can also avail of quick reference manuals for the installation of software agents, like Windows and Linux. Short videos are also at your disposal, to carry you through some of the most technical parts of the configuration and if necessary, you could avail of regular, scheduled workshops. More detailed information on all of the above can be found in our web, at http://pandorafms.com
1.3 The evolution of Pandora as a Project
Pandora was created by Sancho Lerena in 2003. Since then it has gradually evolved to become the resilient, innovative and flexible monitoring tool we offer today.
Originally written in 100% open source code, passed the growing experimental years and after a strong demand for the product from large companies and corporations, we felt compelled to launch the Enterprise version. This version offers some specific characteristics designed for conditions which require to process large volumes of information and to be able to work with thousands of environmental devices.
The company financing and coordinating all the back up work in Pandora’s FMS development is Artica Soluciones Tecnologicas, a Spanish company, founded in 2005 by Pandora’s alma mater. The open source version is, non the less, fully operative and functional as a production tool, and most companies, in fact, who do not require professional support, or which are very well staffed, get by well with the Open Source version.
Pandora FMS can be found, to this day, among Sourceforge’s top rated, with thousands of downloads and satisfied users all over the word. For more information on Pandoras’ evolution and to see a road map of the project, please visit: http://pandorafms.com
1.4 A quick glance at Pandora FMS features
- Autodiscovery.In local setting, the plugins ‘by default’ of Pandora’s agents permit hard disk, partitions, and data bases detections in a data server, among many other features.
- Autoexploration.By remote access and by using the web, it can detect active systems, catalogue them according to its operating system and by applying a profile it is able to commence monitoring them. It can even detect the typology of the web and come up with a web map, based in route distribution.
- Monitoring. Pandora FMS’s agents are the most powerful in the market. They are capable of obtaining information -from the execution of a command to the call, at its most basic level- on Windows API.: events, logs, numerical data, process stages, memory and CPU consumption. Pandora avails of a default monitors’ library, but one of the greatest advantages of Pandora is its resourcefulness to add and create new monitors.
- Remote access. The agents, themselves, can activate services, delete temporal files or execute processes. Commands can also be executed, remotely, from the console, like stopping or starting services. Furthermore, it is also possible to program tasks that require periodical execution. It is also possible to use Pandora FMS to access window machines remotely (via VNC) or to access web systems or Unix through Telnet or OSSH, from a web interface.
- Alerts and Notifications. Notification is just as important as failure detection. Pandora, gives you an almost infinite variety of notification ways and formats. Including, but not limited to: escalation, correlation of alerts and prevention and mitigation of cascading events.
- Analysis and Visualization. Monitoring is not just receiving a trap or visualizing a downed service, but it is also to present forecast reports, correlated summary charts of long term gathered data, to generate user portals, delegate reports to third parties or to define its own charts and tables. Pandora incorporates all of those in a Web interface.
- Inventory making. Contrary to other solutions where the idea of CMDB is a prerogative, to Pandora, it is just an option. The inventory is flexible and dynamic (it can auto-discover, it accepts remote input, etc) It can notify changes ( e.g. uninstalled software) or be used, simply, to make listings.
2 Introduction to Monitoring
Every technical manual on Software will tell you, right from the start, about configuration, text files, data bases, protocols, etc. We very often learn to configure at low levels remaining ignorant towards its full potential, that is, to what can be done with it and in which situations. The purpose of this section is to explain, in a concise but systematic way, the theory behind monitoring, regardless of the software used for that purpose.
2.1 Types of Monitoring
When we talk about ‘how a certain element is’, be it a server, a data base, a Web element, or a refrigerator, we can ask ourselves the following questions:
- How do we obtain the information? Do we have something in place, to that effect? Or do we have to ask along the way?
- Are we interested in having to ask constantly or to wait for something to happen?
- What sort of information does it give me? ¿Is it something I can measure in a graphic and watch its progression?
All of these questions answer the three key points that condition the essence of our monitoring model. The first question responds to weather we are going to use an agents based monitoring, to be executed inside the devise we are controlling; or if, on the contrary, everything is being done externally, by employing an internet connection. There are monitoring systems that operate one way or another, and devices that can only be monitored in this or that way. Pandora FMS supports all models.
The second question concerns to weather the monitoring is synchronous (every X number of seconds it asks itself, regardless of any information changes taking place or not) or asynchronous (it only receives information when something relevant has taken place). If I am using synchronous monitoring with 10 million elements, collecting data at 5 minute intervals, the load will be considerable, but if I do it every 50 minutes, instead, it will be much more manageable, the down side of it being, that if something takes place in between, it can take 50 minutes before I realize it. If I use asynchronous monitoring (e.g. with SNMP traps or logs) I can save in many resources, but I will not be able to draw graphics or do historical graphs, except those directly related to the incidents occurred. Many tools are based solely in one of the models, sometimes known as ‘performance’ or ‘capacity’ tools, and there are other tools based in events managing. They are not often exchangeable in their functions. Pandora FMS supports both approaches. The third question refers to what we are looking for in a given moment in time, it can be a text chain (a descriptive event) a floating comma number (to be able to draw graphics) or simply a status (down, up). Being able to work with different kinds of data allows more flexibility. Pandora FMS supports all types of data. These three "paradigms" condition its environment greatly, as well as the tool chosen to monitor it. Acknowledge the type of information needed and the best approach to obtain it. Plan around the available information elements and on how to monitor them.
2.2 Remote monitoring
When we speak of remote monitoring, we mean to say that it is Pandora FMS’ server the one that probes, (‘polling’) in a synchronous way, the devices it intends to monitor. When we speak about this mode, we are not refereeing to the ‘local’ monitoring, based in agents installed on the devises we wish to observe.
Generally speaking, when we monitor remotely, we do it with two different purposes:
- To make sure they are ‘alive’ (e.g. interface, or active system)
- To obtain a numerical value (e.g. to measure Web traffic or the number of active connexions)
This monitoring, when synchronous, is always done in the same direction: from the monitoring server to the element monitored.
We can, also, be interested in the opposite process: that is, in receiving a warning when an incident occurs. That is asynchrone monitoring, and in the case of remote monitoring, we usually refer to it as SNMP’ traps. .
Synchronous monitoring is usually done using SNMP protocol, which is the most widely used in Web equipment, But it can also be done through WMI, a similar protocol owned by Microsoft.
Both protocols work in a similar fashion, which is, basically, as follows: A server sends a request for a particular configuration element of the ‘SNMP agent’ or ‘WMI service’ that it hears in the device. This particular element is called OID, in SNMP and in WMI, it can be identified by a WQL query; this can have to do with the free available memory, the router’s number of connexions, or the traffic in a given interface.
If the monitoring is based mainly in internet environments, it is important to know SNMP in detail, as it will be the function most widely used of the monitoring tool. The asynchronous monitoring through SNMP is also vital. You will need, together with a monitoring tool, an external explorer of SNMP devices, access to the MIBS collections from the makers of your Web devices (which are like OID’S libraries) and, of course, a lot of patience to investigate, given that each device usually has its own collection of OID’s but you will only be interested in some of those elements, among the thousands that each device has.
If your monitoring is based in Windows servers’ and you are not interested in installing agents in the machines, WMI remote monitoring would be very powerful and well suited. WMI interface is even more potent (and better organized than SNPM’s) with WMI you will be able to obtain, practically any data, status or event on your window servers.
Unix and Windows systems can be also proved with SNMP, but the information returned is scarce, besides, you will need to activate and configure the SNMP agents of the operating system, which can be much more complex than simply installing a Pandora FMS monitoring agent.
Finally, you can always monitor web elements through the use of TCP or ICMP tests. ICMP is used, mainly, for two purposes:
- To verify if a system responds (ping)
- To find out the latency time of that device (in milliseconds)
Through TCP tests, it is possible to test if a web server responds properly, or if a mail server (SMTP) sends the mail well. These type of tests are not intended to just get the server to ‘open the port’ but also to make it ‘communicate’: that is, that the sending mail command receives an O.K to confirm it, or the answer from the web server is ‘200 OK’ (a valid reply in HTTP protocol)
There are a series of plugins, by default, for TCP testing, but it can easily implement its own tests, by adapting its own scripts or developing new ones. The integration with Pandora FMS does not require ‘API’, complex structures or proprietary libraries.
Web Transaction Monitoring, also remote monitoring, receives a separate, specific, chapter, given its importance.
2.3 Local Monitoring (with agents)
When it concerns systems and applications, the best way to obtain information is, without a doubt, from the system. This is done by executing commands, or querying the system data sources, from the same engine we want to monitor. This means we have to execute a command or script, or to investigate the system or the application. To that end, we use Pandora’s monitoring agent, a specific software to take care of those small monitoring tasks.
The agents can only be installed on Unix and Windows operating systems. An agent can not be installed in a cisco device, eg. According to the nomenclature used by Pandora FMS, we use ‘agent’ to refer to the entity containing the information and ‘software agent’ as the part of that software installed in that system to extract information and report on to Pandora FMS’ server. The software agent executes, constantly, on the system (as service) and reports information periodically.
The agents, allow you to do more, besides obtaining information through commands, things like, for instance, to obtain inventory information. It can also be configured to get them to react in case of a problem or a failure, interacting, automatically with the system, deleting a temporary file or executing a given command.
To obtain ‘precise’ and specific’ information, on that we are interested in, we will often have to refer to the manuals of the application we want to monitor, because, even when we avail of ‘generic’ monitors, what we are looking for may not be so trivial.
Exist in Windows, an almost infinite variety of access to the information: WMI, Perfcounters, Eventlog, system logs, registry, commands, powershell scripts, API (by Windows NT) etc. In fact, Microsoft’s architecture is one of the easiest, more powerful and better documented, when it comes to obtaining the information from the system. In Unix/Linux systems the capability of the software agent to execute any command, allows us to benefit from the full shell power.
2.4 Monitoring procedure
Why do you, really, want to monitor, have you given it some thought? I mean, once you have obtained data from your servers: when they fall, or, how much they consume. Have you thought of what will you do? Because, may be, by asking yourself: What is the most critical? first, you will save a lot of time that would, otherwise, have been spent ‘messing” or investigating issues that will not be useful, in your day to day, later.
Please dedicate five minutes to answering some questions. In your case, what do you think better describes your monitoring needs?
- To avoid losses-> availability.
- To analyze degradations->Performance.
- To evaluate growth-> Capacity planning.
On each of those instances the focus will be in certain aspects.
Availability, on first instance. You are mostly interested in events based monitoring and remote monitoring will probably be enough for your needs; is faster to deploy and will give you relatively quick results. You are after SLA informs.
Performance, on second instance. Its field is graphics and numbers, information it obtains through agents or remotely, even though you will probably require agents to get in depth information on their systems. Group reports and combined graphics are your main priority.
Capacity Planning, on third instance. Much more specific, it needs to obtain data, as in the second instance, but to play with it, with predictive monitors and very specialized projective reports. Establishing early alerts will be of great help and you will need to have a good knowledge of the WARNING and CRITICAL status meanings, besides elaborating serial event management policies to prevent the problem before it happens, which is –without a doubt- the most complex and interesting case.
Once you know which model you are to follow, you are left to wonder what to do when the system tells you the service is down, or worst even, What will happen when the server’s capacity reach the limit next Friday?. You need to think in action procedures.
2.5 Action procedures
‘We call action procedures to something that can not be achieved through any tool (so far) which is, basically, thinking and planning how to notify the incidents. In order to do that you will have to consider several factors:
- Event urgency. You are to be able to discriminate between something unusual and something critical.
- Notification format. E-mail, SMS or, why not, a mild shock to the operator to stop him from falling sleep ( we are yet to implement it, but it would not be a difficult think to do…)
- Scalability: It basically consist of notifying someone first, hoping to get the incident solved, if not so then a second person is informed and if the problem persist a third becomes involved, usually some one respected by the former two.
Ideally, before any configuring is done. You should have those concepts in mind. Even better, gather some patience and a virtual design tool (visio, opendraw…) and draw its critical elements and paint with arrows how the information is obtained and who will be notified or what will be done with that information.
By focusing first on the most critical, you reach the starting point to what is the most important for your organization. Once you know what the most critical is, you can discover how to monitor it and in the meantime you can think of who is responsible for the problems in those systems and how to notify it.
2.6 Supervision models
By supervision models we understand the fact that a monitoring system is designed to report information on an automatic system, but that, it is, in fact, watched by a human being in a direct or indirect way. This person, often, receives the name of operator, who is the person who looks at the screen or otherwise receives the events, be it by a ‘smartphone devise, or similar, or by e-mail or logs registered with another tool. The how to does not matter, the important thing is the fact of having someone minding the system.
There are certain people, on the other hand, who we call “system administrators’ in generall or ‘infrastructure personnel’, who are those who, when something happens, receive a call from the operator saying: “hey, we have got a problem here’ or a direct notification, sent automatically, by the system, warning them of something, which is, in most cases, an SMS, or an e- mail.
Here we can see a great difference already:
- The direct supervision model implies a person, or several people, constantly watching the system, so if something critical occurred it would be detected immediately. It can usually notice small, non critical changes, and has a much greater flexibility. It is not necessary to define ’notifications’ (alerts with Pandora) for each possible case, but it is enough with looking at the events (some sort of visor to detect status changes) to have an idea of ‘what is cooking’ in the system, at any given time. It is possible to define many screens and also to define alerts to support that supervision. This model is used in large environments, given that it does not matter how much we define an alert policy, a perfect, autonomous supervision can never be warranted.
- The indirect supervision model, implies there is no one permanently looking at the screen, so it is necessary to define, before hand, the automatic notifications (alerts) that the system is going to have; given that the events, graphics, and maps will not be seen by anyone. This system is suitable when we have few devises, or when we have identified, very closely, what is critical and how to confront the problem (solution and notification)
For team work that involves operators, administrators, and third level personnel, Pandora FMS’ tools like: ticketing of events?, incidences creation, scaling of notifications, internal mail, notice board and chat among Pandora FMS’ users, are very useful
2.7 And what now?
The following chapters are exclusively dedicated to Pandora FMS. Up to this point, we have been discussing general matters which were important for you to know before we continued exploring Pandora FMS. You will probably know many things already and may have used other monitoring programs, you heard, perhaps, that this or that application is always monitored in a certain way because that is the best possible way.
Maybe, but from our experience, each client does things a certain way and regardless of how much we know about monitoring, I doudt we know more on how your infrastructure has been set up than you do. Monitoring easy tasks presents no problems, the hard job is to adapt the monitoring to your business without having to adapt your business to the monitoring, not a trivial one. More that 800 pages await, if you are to discover the best way to monitor your organization, with Pandora FMS, a real challenge.