What Is Incident Management?
Posted by on October 03, 2018
Matt Klassen is the vice president of product marketing at Cherwell. He is passionate about enabling enterprises to accelerate their digital journey through better software and better service. Matt has 25 years experience in developing, architecting, selling, and marketing enterprise software solutions for IT and product teams.
Incident Management might be the most important process in ITIL®. Organizations invest time and effort in developing a service strategy, designing services that will further the business needs of the organization, and then turning those designs into real services that users can operate and maintaining those services for the users who engage with them each day. ITIL's Incident Management processes help organizations get the best possible return on those efforts by working to resolve issues that affect the availability of services. That's exactly why it's typically one of the first ITIL processes organizations adopt.
In this guide, we'll look at ITIL's Incident Management system in detail. Beginning with a definition and objective statement for the process, we'll look at how ITIL defines the process flow, understand how the support team works together to resolve IT incidents, and learn how the process' success within an enterprise can be measured using key performance indicators (KPIs). Finally, we'll examine how new integrated service management software facilitates automation and helps organizations establish a consolidated service desk and resolve incidents more efficiently.
What Is Incident Management in ITIL?
In ITIL, the term "incident" is used to describe an unplanned interruption or reduction in quality of an IT service, which can be tremendously costly for large organizations. The primary objective of the Incident Management process is to return service to users as quickly as possible when an interruptions occur.
Along with basic request fulfillment, Incident Management is one of the most important processes that IT organizations manage each day. While the request fulfillment process is used to address standard user requests like changing a password, Incident Management addresses genuine service outages with the goal of resolving the outage and returning service to users as quickly as possible.
In the five-stage service lifecycle model used in ITIL, Incident Management falls under "Service Operation." This is the fourth stage of the service lifecycle and the one where a service is already in operation by the organization. The process helps ensure that an organization can extract the maximum value from the services and applications that it supports by working to ensure performance, availability, and user access to the service.
What Are the Incident Management Process and Workflows?
Incident Management is the process that IT organizations follow to manage the lifecycle of incidents that are reported. That process consists of several steps, often known as sub-processes, that must all be carried out to ensure that incidents are adequately resolved and documented. Below, we describe each of the sub-processes and what they achieve for the organization.
Incident Management Support - The goal of Incident Management support is to provide and maintain the tools, processes, skills, and rules needed for effective and efficient handling of incidents. This process helps to ensure that service desk agents or technicians have adequate education and training to respond to and resolve incidents that occur within the IT organization. This process also maintains the rules and workflows for processing and resolving incidents, ensuring that technicians always know what the next step is to ensure an incident is resolved.
Incident Logging and Categorization - The objective of this sub-process is to record and prioritize incident reports with the appropriate diligence to facilitate a swift and effective resolution. Organizations often have limited resources for resolving incidents and other IT issues, and the effective prioritization of inbound incident reports is a crucial step in ensuring that labor is allocated appropriately towards the highest-priority incidents. IT organizations need to be proficient at determining the scope and severity of a reported incident and prioritizing it accordingly. Incident logging and categorizations is often automated such as when an IT operations monitoring solution creates an incident due to a performance or availability event occurring.
Immediate Incident Resolution by 1st-Level Support - When a user reports an incident to the service desk for the first time, they will typically report the issue to a 1st-level service technician. The ideal outcome is that the 1st-level technician can address the incident and restore the IT service on the first call and within a target resolution time set by the IT organization. When an incident cannot be resolved within the target time, or if a greater degree of technically specialized knowledge is required to resolve the incident, an escalation occurs and a 2nd-level support technician can take over the incident.
Incident Resolution by 2nd-Level Support - Once an incident has been escalated beyond a first-call resolution by 1st-level support, a 2nd-level support technician can take over the incident and begin searching for a workaround to restore service as quickly as possible. At this level, the technician has the flexibility to involve support groups or third-party suppliers in the resolution of the incident. If the incident is due to a malfunctioning application, for example, the 2nd-level technician may contact the company that developed the application for additional guidance in resolving the incident. If there is no way to address the root cause of the incident, the 2nd-Level Support technician can create a Problem Record and transfer the incident to the Problem Management process/team.
Handling of Major Incidents - Earlier, we mentioned the importance of prioritizing incidents according to their urgency so that resources could be deployed most efficiently. Major incidents are the highest priority IT incidents that an organization can recognize—they constitute serious interruptions or threats to business activities and need to be resolved with the utmost urgency to prevent financial losses or other critical consequences. Major incidents are escalated rapidly through 1st-level and 2nd-level support personnel and can involve third-party suppliers if the incident is not resolved quickly. Again, if a correction of the root cause is impossible, the incident is transferred to Problem Management.
Incident Monitoring and Escalation - IT organizations following ITIL best practices will establish and maintain a system for monitoring the status and escalations of each IT incident that is reported. IT managers that deal with Incident Management should be able to track the number of incidents currently reported and see their status in the Incident Management process. Service level agreements are breached when the Incident Management team takes too long to respond to incidents, and service outages lead to business interruptions. Incident monitoring is used to ensure that Incident Management tickets are being resolved and moved through the process in a timely fashion, such that service levels are maintained for the organization.
Incident Closure and Evaluation - Once an incident has been effectively resolved, the incident record is submitted to a final quality control step. This sub-process confirms that the incident has been resolved and that the lifecycle of the incident has been documented in sufficient detail. The findings from the incident report can be used by the organization in the future, including as an input for the Knowledge Management process. Incident Closure and Evaluation helps to ensure that the organization tracks all important information about an incident, and that it can learn something about the incident having resolved it.
Proactive User Information - Incident Management reports are usually submitted through the organization's service desk, which acts as a single point of contact for IT resources within the organization. The service desk team can also use this communication portal to proactively inform users about known issues and service outages within the organization. This sub-process helps to distribute information throughout the organization and cut down on the number of requests and inquiries on the service desk by providing up-to-date information about service outages within the organization.
Incident Management Reporting - This sub-process works to capture information from the Incident Management process and supply it to the other Service Management processes, ensuring that the organization has an opportunity to improve its performance based on data from past incidents.
How Do Organizations Measure Success in Incident Management?
Measuring the success of processes across the ITIL service lifecycle is the key to continuous service improvement. Organizations should decide on metrics that will be used to monitor the performance of each process and report accurately on those metrics to help identify the best opportunities for improvement. Below, we've listed five of the most significant KPIs that organizations can measure to ensure their Incident Management process is performing up to par.
Status of Incidents - Organizations can use software to track the status of incidents that are currently being managed as part of the Incident Management process. A look at the status of all open incidents in real-time can reveal information about where the largest back logs are being created and how the organization can best commit resources to improve flow and shorten resolution times. For example, if a lot of incidents are getting stuck at 2nd-Level Support without being resolved, the company could pursue several potential solutions:
- Add more 2nd-level support staff to expedite handling of incidents.
- Add more training for 2nd-level support staff to increase efficiency of incident resolution.
- Add more training for 1st-level support staff to reduce escalations.
- Engage 3rd-level support that can help manage the backlog of incidents of a specific type (for example, if there is a backlog of incidents for a malfunctioning printer, contact the manufacturer to help resolve issues).
First Call Resolution - The first call resolution rate tells us how often incidents are resolved by 1st-level technical support staff on the first call. Timely resolutions are the result of effectively trained staff with sufficient experience and access to resources and knowledge.
Average Cost per Incident/Incident Resolution Effort - Organizations can choose to measure either the average cost per incident managed or the average effort spent to resolve each incident. Organizations would like to minimize these costs while satisfying service level agreements and customer satisfaction. IT investment that leads to enhanced business up-time should generate a positive return on investment.
Average Initial Response Time - This KPI measures the average time between when a user reports an incident and when the service desk responds to the incident. If the service desk can resolve incidents quickly, but it takes three hours to get a response, the organization might consider adding more 1st-level service technicians to reduce the response time and correspondingly increase service availability.
Number of Repeated Incidents - Repeated or re-opened incidents are bad news for your organization. They can mean that support technicians have not identified the root cause of an issue, and therefore it keeps happening. Perhaps the IT staff knows how to resolve the issue and the users could actually do it themselves, but there are no resources available to facilitate self-service. Repeated incidents can be avoided by finding the root cause of an issue and pro-actively communicating with users to help them resolve the issue without reporting it to IT.
How Do Software Solutions Support Incident Management in IT Organizations?
Today's IT organizations use IT service management (ITSM) and/or service desk software to deliver services based on ITIL best practices including the sub-processes of Incident Management. IT teams that want to transition from a break-fix model of Incident Management to a consolidated IT service desk model can benefit from the capabilities of Cherwell
Service Management CORE, the ITSM software that has already helped thousands of companies around the world establish ITIL compliance and improve their Incident Management processes.
Service Management CORE - PinkVERIFY™ certified for eleven ITIL processes out of the box including Incident Management, and offers organizations the opportunity to start automating their processes, saving time and reducing costs right out of the box. You can also view our Essential Guide to ITIL Incident Management where we break down the five-step process for effectively implementing the process at your organization.
Cherwell also helps organizations manage their IT infrastructure with IT Asset Management (ITAM), which includes features for computer inventory, software license compliance, software usage monitoring, IT purchasing, and more.
Incident Management is one of the 26 processes of ITIL and falls under the "Service Operation" stage of the IT service lifecycle. The objective of this process is to return service to users as quickly as possible after an incident interrupts the service. The sub-processes of this process include:
- ●Incident Management Support
- ●Incident Logging and Categorization
- ●Immediate Incident Resolution by 1st-Level Support
- ●Incident Resolution by 2nd-Level Support
- ●Handling of Major Incidents
- ●Incident Monitoring and Escalations
- ●Incident Closure and Evaluation
- ●Pro-Active User Information
- ●Incident Management Reporting
Organizations can use a variety of KPI measurements to track the success of their Incident Management processes and initiatives. Many organizations employ software such as Cherwell CORE that automates parts of the Incident Management process and facilitates compliance and integration with ITIL and its additional processes.
Discover how our Incident Management Solution can be a right fit for your organization.
Ebook 7 min
7 Deadly Sins of ITIL Implementation
Wondering whether ITIL® is still relevant in today's fast-paced digital environment? ITIL holds many timeless truths, but it can be misapplied when taken too literally. Uncover the seven mistakes commonly made with ITIL implementations, and gain guidance on how you can go faster—while still upholding ITIL's key principles.
White Paper 7 min
ITIL Made Easy: Best Processes and Best Practices
How do you simplify and remain ITIL compliant in today's increasingly dynamic and fast-paced business environment? This white paper details the step-by-step process to achieving ITIL success using the "Golden Triangle" (People, Process, Technology) framework as your guide.
You might also be interested in
Blog 10 min
What Is the ITIL Change Management Process?
Get to know the principles and objective of this vital ITIL process, along with the steps involved in the process.
Essential Guide 5 min
The Essential Guide to ITIL Incident Management
Incident Management is usually the first IT Infrastructure Library (ITIL®) process targeted for implementation or improvement among organizations seeking to adopt ITIL best practices. The reasons for this are simple: Improved Consumerization and Service Value Realization.