What is a post mortem incident? How can we monitor this?
This post is also available in : Spanish
A computer post-mortem incident contains very important lessons!
In particular, I liked very much the article that our colleague Sara Martin wrote in Pandora FMS blog about crisis management in information technology, these are the steps:
- To know how to apologize.
- To disseminate the problem and stress that it is being solved.
- Be transparent.
- Good communication.
- After the crisis.
Post Mortem Incident
Legend: “Jack’s Lantern (https://commons.wikimedia.org/wiki/File:Jack-o-lantern.svg)
This article starts from point number five: when after a certain time of recovery the crisis has been solved and becomes a post mortem incident. This word comes from the Latin language and it means “after death”. Now that October is here, and Halloween is coming, you should be careful! On the other hand, it is more “fun” to use the words “incident post mortem”, because it adds mystery and interest to the matter…
The field of monitoring is just one area in order to learn from mistakes; the U.S. Space Agency (NASA) have its own “My Best Mistake” section on the APPEL (Program/Project and Engineering Leadership Academy) blog. Big companies can also face problems; I have already written about a case of a giant company where they had to handle an IT crisis, from which they came out very successful. Such an incident did not, in fact, represent a certain danger to the company’s reputation; however, it did serve to review work procedures and techniques, which was most likely done by personnel who at the time of the incident had stopped working there some time ago.
Other cases are more serious, such as what happened to an airline in 2017; in this last example the post mortem incident is evident because the decisions were made – or rather ceased to be made – by senior management. My point is that, whether employees or senior management, for a post-mortem incident we seek to establish what caused it or what set of causes led to a problem, without seeking blame. This does not mean that other entities, such as governments or even a shareholders’ meeting, conduct their own investigation; if we properly document them, they will not even have to call or communicate with us. Let’s see.
Experience is the name we give to our own mistakes
Legend: Always boasts of good faith in a post-mortem incident (https://commons.wikimedia.org/wiki/File:Wiki-halo.png)
A few tips for focusing on a post mortem incident:
- As in Wikipedia, let’s always assume that you acted in good faith.
- Complex systems (specially now that we have a multitude of devices with very different operating systems) make it difficult for there to be a root cause; let’s investigate what really happened, let’s identify the different causes, not the culprits.
- Let’s take our time, this is a marathon, not a 100-meter race. Human beings are reluctant to change and take a long time to recognize our mistakes, tend to hide them or, worse yet, blame others for our mistakes.
Post mortem incident monitoring
A system administrator who has been working in a company for a long time or organization is able to imagine which would be the cases and conditions that deserve to configure an alert in Pandora FMS.
Yes, we know, many other software also have this feature, but here we have a powerful tool for Customer Service Centers (CSC) such as Integria IMS. Pandora FMS brings preconfigured actions and commands necessary to perform this integration. Even if the problem is reiterated we will be able to do its incident post mortem, assigning to other people in charge to tackle the case and follow it in the long term, and not only that: with Integria IMS we will be able to quickly generate reports with the weak points of its service, we will be able to add users by roles of profiles and groups with the information obtained from Pandora FMS. This makes it easier, in the future, to add third parties or companies to our post-mortem incidents. There are also other add-ons for Pandora FMS that offer a similar service.
Another important point is the timeline occurred in our post mortem incident. For it, Pandora FMS has an excellent collection of records, which we will be able to see in consoles created for such an effect and if still this is not enough we will be able to do direct searches in ElasticSearch on the data stored there by Pandora FMS. there are many possible combinations and work methods for the task to do!
Post mortem incident on sunny days
For the whole planet Earth, sun is life. A sunny day is a happy day in our mind, and a post-mortem incident can – and should – occur when a project came to a happy conclusion or with results that far exceeded expectations or calculations.
Again, as human beings we always think about the bad, the negative and we don’t stop to think about how good events or results happened in order to repeat them in the future or reuse parts or components of them in other projects. Let’s be optimistic!
Before saying goodbye, remember that Pandora FMS is a flexible monitoring software, capable of monitoring devices, infrastructures, applications, services and business processes.
Do you want to know better what Pandora FMS can offer you? Find out here.
If you have more than 100 devices to monitor you can contact the Pandora FMS team through the following form.
Also, remember that if your monitoring needs are more limited you have at your disposal the OpenSource version of Pandora FMS. Find more information here.
Don’t hesitate to send your queries. Our Pandora FMS team will be delighted to assist you!