Cherwell IT Service Management Blog
Resources, Best Practices, and Solutions for ITSM Pros

How Service Desks Can Anticipate Failures and Disasters

Posted by

Help Desk, ITSMDuring the last few years, there seems to have been a large increase in disasters and calamities around the world from forest fires to floods and from tsunamis to earthquakes. It was reported that on one island during the 2004 tsunami, most of the animals survived because they instinctively headed for higher ground, and wisely, the local population followed them and also survived.

Why did the islanders follow them? Because they learned a long time ago to follow the animals. Why did the animals go to higher ground? There are many theories—from the change in air pressure to seismic rumblings that are impossible for humans to detect. The investigation and experimentation continues, but why didn’t the more populated islands do the same? Why did they not allow some animals to roam freely on those Indian Ocean islands to work as a warning signal? This is not meant to be a flippant statement. It’s just surprising how quickly we lose inherited information because. There would have been some folks on those islands who remembered warnings about animal movements but were probably simply ignored.

How does this relate to ITSM? Well anyone who has worked on a service desk has probably had the experience of knowing an impeding IT disaster is looming but without knowing why. Perhaps there is no tangible sign, just a feeling or observation that all is not well and about to get worse. In the old days of mainframe batch processing computers, like 7010s, 1400s, and IBM360s, computer operators knew the sounds of the computers so well they knew when there was a failure because there were subtle changes in sounds from their computers. In those days, I remember an incident when we got “buzzed” in the lunch room by one of the operators asking us to come back to the computer room immediately. “What’s wrong,” we asked. He was not sure, but he knew it didn’t sound right. He was correct. In the thirty seconds it took us to get to the computer room, applications were collapsing everywhere. By the way, it was a disc failure. Our trust was implicit because operators were that in tune with their computers.

Unfortunately, we do not have that rapport with out computer systems anymore. However, we still have our service desk staff and some hands-on technicians like network managers. Just like with the tsunami investigation into why the animals headed for the hills and whether we can replicate those feelings and instincts, IT can start a program to try to get early alerts for IT failures. Often, it is not how you handle a disaster but how you act in the aftermath of a disaster that matters.

So what should we do? Well, every significant outage or failure should have an inquest into why it went wrong and how the same or similar outage can be prevented in future. Part of that inquest should be to ask what the early warning signs were. Each service desk agent should prepare a short report of their observations and feelings directly before the disaster or significant outage became apparent. This can then be used in the future to try to spot impending disasters and other outages.

Don’t lose this valuable learned knowledge. Turn IT folklore into useful knowledge!