Searching for the Perfect Software or the Fewest Possible Bugs

As we know, here in Pandora FMS our task is monitoring, always in a flexible way, and which consists, in a few words, in verifying that everything goes well. In a perfect world monitoring would not be necessary but, as infinitesimal quantum mechanics governs this universe we barely know, there are plenty of things that can go wrong. A server without ECC RAM may be affected by solar or cosmic radiation, resulting in unpredictable results.

perfect software

Memory with ECC

As we know that “if there are many ways to do something, one of those ways will go wrong”, it has always been programmed from the point of view of adding redundancy, one after the other, in order to minimize undesirable events. In this article we will not get so far ahead of ourselves in the future, but we will visit key concepts that we have exposed previously and then talk about Perfect Software, its search and verification of its existence.

Perfect Software and Monitoring

One of Pandora FMS’s premises is to be in permanent contact with our users, and this blog is about it, to keep you updated and know your concerns about our software and monitoring in general. From version 7 (New Generation) we chose the path of Continuous Software Liberation to minimize the impact on updates and with the improvement and correction of a few specific elements, all for the sake of perfectibility. However, this is not sufficient due to many internal and external factors.

Internal factors

We already told you how it goes with the Continuous Release of software in detail. It is a trend today on the part of all software houses to achieve Perfect Software. The more changes we make to a system, the greater the likelihood of adverse outcomes.

In computing there are no errors as such, but exceptions. For example, we all know that dividing any number by zero results in infinity; however, in mathematics there are functions that when applied to the theory of limits and derivatives can produce a fixed and exact value: that is an exception. We always program to avoid divisions between zero, but rarely do we remember that we will not always get infinity – or another result that is not chordal – as an answer.

What we want to say is that we never reach perfection, but we are always willing to take a very small risk: when we get 99% satisfaction from a program or system we tend to indicate that “everything is fine”. Even Pandora FMS, with its dynamic monitoring, does not burden us with unnecessary alarms (a good period of time has always been spent collecting values to discern what is normal in a company but not in others).

Of course, in this twenty-first century we use not only beta users, but we automate the tests to the maximum and also companies, to have greater confidence, as time goes by offer monetary rewards to all those who may discover some vulnerability in the software.
One vulnerability is another of those exceptions we’re talking about: they probably won’t bother users by themselves at all, and correcting them will cost programmers and developers time and money. The problem lies in the unhealthy intentions of third parties to obtain any kind of undue advantage. Generally, in these cases the private bounty hunter makes his discovery, his silence is financially rewarded until the software house corrects the vulnerability and it is up to that moment that the fortune hunter makes his public fuss (they even caravan him in the media). As we can see, in those particular cases everyone wins.

External factors

As we said at the beginning, servers use more expensive RAM with Error-Correcting Code or ECC (Error Correcting Code) to guarantee the CPU the delivery and reception of data. Also an overvoltage due to the fall of a thunder on the electrical lines, no matter how many protective devices you have, can affect unexpectedly. But not only natural phenomena are to blame for setbacks in our work.

NASA and the perfect software

Since the dawn of computers, the U.S. National Aeronautics and Space Administration (or simply NASA) has been aware of computers and their use was particularly useful when, in the late 1960s, we reached the Moon. The mission that achieved the feat was that of Apollo 11 and the main responsible for the development of the operating system used was Dr. Margaret Hamilton.

perfect software

Dr. Margaret Hamilton working on the Apollo space mission

It was on July 20, 1969, when this brilliant woman’s concerns about Perfect Software were crystallized into a resounding success at the time of the moon landing. Here we will tell you that the on-board computer was designed to use a maximum of 85% of its capacity of calculation, to discern on which tasks were priority and in case of presenting a glitch software to restart and to continue exactly with the same values at the moment of the misfortune.

At that time all this was a real feat and the best part was that none of the astronauts had to worry at all. All this was transparent to the user… or almost. At the time of the landing, from the control center of flight on Earth ordered the crew to activate the radar of encounter (“rendevouz radar”) which serves to connect two spacecraft, then to undertake the return. The good idea was that in case the moon landing failed, the radar would be on and working to connect the shuttle in orbit 110 kilometers from the Moon and bring our heroes safe and sound.

perfect software

Apollo 11 mission badge

The radar had three modes of operation: automatic, manual and synchronized with the on-board computer. It was thought that the first two modes did not affect the landing calculations at all, and in fact they operated it in manual mode… without knowing that under certain very particular circumstances, by means of the same power supply that fed both types of equipment, the radar began to add “noise” that the operating system interpreted as monitoring data sent by the radar.

As the operating system knew that this data had nothing to do with the moon landing, it was continuously discarded and even restarted the computer several times without the astronauts or the ground team knowing. And everything would have remained there, until the CPU load exceeded 85% and the alarms went off.

In total there were five warning notices that almost made the mission abort, if not because they searched the descriptions of the values 1201 and 1202… to know that it simply indicated that there was less than 15% of calculation capacity left. Knowing this in real time, they finished their task for the glory of Humanity.

Free Software and Perfect Software

We regret to disappoint you: they are not the same, nor are they synonyms. The only advantage of Free Software is that we will be able to analyze the source code ourselves, repair it and even improve it, always under our absolute responsibility and with the obligation to share all our modifications. It is then that free software is a tool or model to pave the way to software perfectibility and together with software engineering will someday be achieved. Precisely, in Pandora FMS, since 2004, we bet on this path.

Our Destiny: Moving from Earth

When we go into space we won’t have the benefits of continuous software integration to deliver the necessary patches. Even in those cases we have to get used to the idea of monitoring the most diverse devices. When we visit other worlds, we will surely count on monitoring to achieve our goals, no matter how ambitious they may seem today.

Questions, comments? Contact us! Do you have a story or anecdote about the search for Perfect Software? Leave your comments below.