The Incident Management Process
(Also Known as the Incident Management Lifecycle)
Incident Management Process Flow
How does Incident Management work? ITIL provides a framework, not a rigid set of instructions, for effective IT service delivery, adaptable by organizations to meet their IT service delivery needs. The Incident Management process can be structured to manage Incidents reported automatically by an event management tool, by users or service desk technicians via a self-service portal, over the telephone, email or in person. The Incident Management lifecycle includes:
1) Incident identification
Ideally Incidents are identified at a very early stage through automated event monitoring, even before it impacts a user. However, this isn’t always the case. Sometimes Incidents are identified by the impacted user reporting it to the service desk.
2) Incident logging
In order to maintain a complete historical record, all Incidents, regardless of the method used to identify and report them to the service desk, must be logged with all relevant details, including date/time, user information, description, related Configuration Item from the CMDB, associated Problem, resolution details and closure information.
- Incident classification – Once logged, all appropriate categories must be selected in order to properly assign, escalate and monitor frequencies and Incident trends.
- Incident prioritization – Assigning priority is critical in determining how, when and by whom the incident will be handled. Priority is based on the level of urgency – for example, the number of affected users or its impact on the business – and determines how quickly resolution is required.
3) Incident investigation and diagnosis
This step takes place immediately in order to determine the best course for correction. The technician may rely on the knowledge base, FAQs or known errors for diagnosis and/or resolution.
4) Incident assignment or escalation
Initially, the service desk technician attempts to resolve the Incident. However, if the service desk is unable to provide resolution, the Incident is escalated to the appropriate level of support, possibly involving either second- or third-level technical support staff who possess the skills to resolve the Incident.
5) Incident resolution
Once resolved, the solution can be implemented and tested to confirm service recovery.
6) Incident closure
Following confirmation that the Incident has been resolved, and the end-user is satisfied and in agreement, the Incident can be closed. The service desk technician should ensure that the initial classification details are accurate for future reference and reporting.
7) User satisfaction survey
A user satisfaction survey may be utilized to determine overall satisfaction with their service delivery. This is one of the most effective ways to build and maintain a positive relationship with your customers and users, especially if you pay close attention and implement improvements based on their feedback. There are several methods for gathering feedback, including after-call surveys, personal telephone surveys and, most commonly, the online survey.
There are a number of best practices one ideally follows when developing a user satisfaction survey:
- Explain the purpose of the survey
- Distribute the survey randomly for the most accurate results
- Keep it short, yet thorough
- Clearly state your questions
- Keep open-ended questions to a minimum
- Share survey results and the improvements you have made
***Major Incidents – Occasionally, a major Incident will occur, causing serious interruption to important business services. These high impact and high urgency Incidents typically affect a large number of users and deprive the business of one or more critical services. In the case of a major Incident, a team will come together, placing the highest priority on restoring normal operation.
Each organization will develop their own criteria for identifying a major Incident, but characteristics include:
- Impacts a large number of customers
- Cost of downtime is substantial to customers and/or the business
- The time and effort involved in restoring normal operation is longer than agreed service levels