The Essential Guide to ITIL Incident Management
Table of Contents
- Forward and Background
- What is Information Technology Infrastructure Library (ITIL)?
- What is Incident Management?
- The Incident Management Process
- Inter-Related ITIL Processes
- Incident Management Roles and Responsibilities
- Incident Management Key Performance Indicators (KPIs)
- Best Practices for Implementing Incident Management
- Feature Checklist for Incident Management Software
Forward and Background
Understanding Your Level of Organizational Maturity When Implementing ITIL
Incident Management is usually the first IT Infrastructure Library (ITIL®) process targeted for implementation or improvement among organizations seeking to adopt ITIL best practices. The reasons for this are simple: Improved Consumerization and Service Value Realization. Incident Management is the day-to-day process utilized by the organization through engagement with the service desk or self-help technology for rapid service restoration.
The high performance of this process is critical to the organization and to the users of impacted services. Without it, chaotic behavior is experienced, impacting user performance, organizational performance and overall economic value for both the customer and the supplier of the service. Incident Management itself should support the business strategy, and the business strategy should enable the means by which Incident management is performed to obtain value.
Anthony Orr: The Do's and Don'ts of Implementing IT Incident Management
In this brief video, ITIL Author and Examiner, Anthony Orr, shares best practices and common mistakes relating to implementing IT Incident Management within the enterprise.
A story I tell often to help organizations and people understand their maturity with respect to this process goes like this: I was keynote speaking at a conference, and one of the attendees came up to me afterward and said, “Hi Anthony, I have to tell you about our success. We are actually having a celebration at the office with cake because we have managed our one-millionth Incident.” Most people to whom I tell this story immediately smile or chuckle a little because they can’t believe this is something to celebrate. But it is! This is an indicator of the company’s level of maturity with Incident Management. To understand this better, the “before” state was simply this: there was no service desk, and when help was needed in the organization the user had to find help on his/her own. There are many organizations globally that still do not have a service desk. And it is important to understand your level of maturity.
Understanding you current state helps to obtain desired state. Performance indicators and critical success factors related to objectives help with maintaining the current state of maturity and to initiate projects that lead to an improved desired state of maturity. In the above story, depending on business needs, the goal may be to improve first-call resolution, implement self-help capabilities or any number of other things. It is important organizations focus on the essentials for quick wins. At the highest level of maturity for this process, users will experience no Incidents related to services delivered.
As you read through the content in this publication, keep in mind the value to the business of doing what is essential for your organization, and doing it right by leveraging people, process, technology, and suppliers to meet your objectives. Service excellence is a journey that never ends and must be practiced. Celebrate your successes!
What is Information Technology Infrastructure Library (ITIL)?
ITIL is a world-renowned best practice framework, adopted by individuals and organizations in both the public and private sector as a framework for aligning IT services with the needs of the business. Its most current version, ITIL 2011, consists of five core publications, including Service Strategy, Service Design, Service Transition, Service Operation and Continual Service Improvement. This guide provides a comprehensive explanation of Incident Management, a critical process within the Service Operations book.
Service Operation is an essential element of the procedural life cycle, delivering service and value to the business, customers and users. It ensures that agreed upon service levels and quality are achieved or surpassed, providing both an introduction and guidelines to activities that contribute to IT operational excellence.
ITIL Service Operation processes include:
- Incident Management
- Problem Management
- Request Fulfillment
- Event Management
- Access Management
ITIL emerged as a concept when the British government determined the quality of IT service provided to them was inadequate. The Central Computer and Telecommunications Agency, which merged with the Office of Government Commerce in 2000, launched the first version of ITIL, called "Government Information Technology Infrastructure Management," in the early 1980s. The framework spread across Europe in the 1990s.
Version 2 of ITIL was released in 2001, and it quickly became the most popular IT Service Management best practice framework throughout the world. The next major version change came in 2007 with ITIL V3, which emphasizes IT and business alignment.
The most current update to ITIL occurred in 2011 with what is called ITIL 2011 – a tune-up to ITIL V3.
What is Incident Management?
Incident Management is an IT service management process intended to restore “normal” service operation as quickly as possible, minimizing any adverse impact on business operations or the user. Success is achieved by promptly and effectively dealing with all Incidents reported by users, discovered by technical staff or automatically detected by a monitoring solution. The IT Infrastructure Library (ITIL) defines an Incident as “an unplanned interruption that causes, may cause or reduces the quality of an IT Service.”
Common Incident Examples
Although there are endless reasons users contact the Service Desk for assistance, certain Incidents are common across every organization:
- Active Directory password reset
- Delete Active Directory account
- Error message when trying to launch or access an application
- Printer not printing
- Hardware – printer, fax, scanner, tablet not working
- Monitor flickering
The Purpose and Importance of Incident Management
Each stage of the entire ITIL Service Lifecycle provides value to the business in one way or another. Service Operation delivers both long term incremental and short term ongoing improvements. The primary goal of the Incident Management process is to restore normal service operation as quickly as possible. When successfully implemented, Incident Management offers the following types of benefits:
- Reducing unplanned IT service staff costs by reducing the number of Incident tickets
- Decreasing business and user downtime with faster Incident detection and resolution
- Increasing productivity across the organization by restoring normal operation quickly
- Identifying training opportunities and potential service improvements
- Improving user satisfaction
- Demonstrating IT’s value to the business by aligning IT activities to business priorities
- Reducing the impact on the business and user with improved monitoring
- Reducing lost Incidents
The Incident Management Process (Also Known as the Incident Management Lifecycle)
How does Incident Management work? ITIL provides a framework, not a rigid set of instructions, for effective IT service delivery, adaptable by organizations to meet their IT service delivery needs. The Incident Management process can be structured to manage Incidents reported automatically by an event management tool, by users or service desk technicians via a self-service portal, over the telephone, email or in person. The Incident Management lifecycle includes:
1) Incident identification
Ideally Incidents are identified at a very early stage through automated event monitoring, even before it impacts a user. However, this isn’t always the case. Sometimes Incidents are identified by the impacted user reporting it to the service desk.
2) Incident logging
In order to maintain a complete historical record, all Incidents, regardless of the method used to identify and report them to the service desk, must be logged with all relevant details, including date/time, user information, description, related Configuration Item from the CMDB, associated Problem, resolution details and closure information.
- Incident classification – Once logged, all appropriate categories must be selected in order to properly assign, escalate and monitor frequencies and Incident trends.
- Incident prioritization – Assigning priority is critical in determining how, when and by whom the incident will be handled. Priority is based on the level of urgency – for example, the number of affected users or its impact on the business – and determines how quickly resolution is required.
3) Incident investigation and diagnosis
This step takes place immediately in order to determine the best course for correction. The technician may rely on the knowledge base, FAQs or known errors for diagnosis and/or resolution.
4) Incident assignment or escalation
Initially, the service desk technician attempts to resolve the Incident. However, if the service desk is unable to provide resolution, the Incident is escalated to the appropriate level of support, possibly involving either second- or third-level technical support staff who possess the skills to resolve the Incident.
5) Incident resolution
Once resolved, the solution can be implemented and tested to confirm service recovery.
6) Incident closure
Following confirmation that the Incident has been resolved, and the end-user is satisfied and in agreement, the Incident can be closed. The service desk technician should ensure that the initial classification details are accurate for future reference and reporting.
7) User satisfaction survey
A user satisfaction survey may be utilized to determine overall satisfaction with their service delivery. This is one of the most effective ways to build and maintain a positive relationship with your customers and users, especially if you pay close attention and implement improvements based on their feedback. There are several methods for gathering feedback, including after-call surveys, personal telephone surveys and, most commonly, the online survey.
There are a number of best practices one ideally follows when developing a user satisfaction survey:
- Explain the purpose of the survey
- Distribute the survey randomly for the most accurate results
- Keep it short, yet thorough
- Clearly state your questions
- Keep open-ended questions to a minimum
- Share survey results and the improvements you have made
***Major Incidents – Occasionally, a major Incident will occur, causing serious interruption to important business services. These high impact and high urgency Incidents typically affect a large number of users and deprive the business of one or more critical services. In the case of a major Incident, a team will come together, placing the highest priority on restoring normal operation.
Each organization will develop their own criteria for identifying a major Incident, but characteristics include:
- Impacts a large number of customers
- Cost of downtime is substantial to customers and/or the business
- The time and effort involved in restoring normal operation is longer than agreed service levels
- Configuration Management
- Change Management
- Service Level Management
- Availability Management
- Capacity Management
- Event Management
Incident Management Roles and Responsibilities
Well defined roles and responsibilities are critical to the effective execution of the Incident Management process. The Incident Management team is comprised of the following:
The Incident Manager has primary responsibility for driving and continually improving the Incident Management Process. In small- to mid-size organizations, this role is commonly assigned to the Service Desk Manager; in larger organizations, this may be a separately defined role. Key responsibilities include: team leadership, reporting key performance indicators (KPIs) back to management, direct management of first and second line support, managing the Incident Management system and enforcing the Incident Management process work flow.
First Line Support
First Line Service Desk Technicians are the single point of contact for end users seeking information and reporting service disruptions. They are primarily responsible for the initial support and classification of Incidents and the immediate attempt to restore a failed service as quickly as possible. If they are unable to resolve the Incident, the First Line Service Desk Technician will route the Incident to appropriate support personnel, monitor activity and keep users up to date on the status of their Incident.
Level Two Support
Second Line Support Technicians typically have more advanced knowledge than First Line Service Desk Technicians. They may become responsible for Incidents that First Line Support is unable to resolve. These technicians may interact with third party experts from software or hardware vendors to help restore normal service as quickly as possible.
Incident Management Key Performance Indicators (KPIs)
Measurements are important across all stages of the ITIL lifecycle. Each process has metrics that should be monitored and reported to effectively evaluate the overall performance. Continuous Service Improvement necessitates that the performance of each process be measured to identify areas needing improvement.
Typical Incident Management metrics include:
- Total Incidents reported (per category, priority, person, organizational unit, etc.)
- Status of Incidents
- Time between Incident creation and resolution
- Incidents and SLA (reached, breached)
- Average cost per Incident
- Reopen rate
- Incidents handled without escalation
- First call resolution
- Configuration Items experiencing recurring Incidents
- Incidents by time of day
KPIs should be related to Critical Success Factors (CSF) and CSFs should be related to objectives. This relationship helps with decision support for maintaining current state and improving to desired state. Although each organization is different, relevant reports for users, staff and management will help support important decisions that can be used to improve both the processes and the business as a whole.
Best Practices for Implementing Incident Management
Adopting the ITIL framework within a business can be a daunting task. As with any ITIL process, Incident Management implementation requires support from the business. Of particular importance is gaining buy-in from executives and upper management. Before beginning the adoption process, it’s important to have at least one person dedicated to the overall project management and orchestration of adherence to best practices for Incident Management. It is also extremely helpful to have an IT service management (ITSM) tool in place that will support your current state processes and desired future state processes, as well as a Service Desk acting as the primary interface with the IT department.
1) Understand the current Incident Management process
Occasionally an organization does not have a consistent process for handling incidents, or they have a less sophisticated one in place. Either way, it is important to map the existing process as well as possible in an effort to understand what the existing Service Desk process offers.
2) Identify long-term Incident Management process vision
It is also important to understand what the organization expects from the Incident Management process. The expectation may be based on generic Incident Management templates included with the ITSM tool or a more custom process based on the organization’s specific needs.
3) Conduct a gap analysis
Next, identify what must be adjusted between the organization’s current Incident Management process and its long-term vision for Incident Management. This will arm you with valuable information about the effort, time, money and resources necessary to achieve your Incident Management objectives and you overall service goals.
4) Create an implementation road map
Adopting any ITIL process will take time to develop, and you will need a road map to help set expectations for management. Use that road map to describe the activities, timeframe and efforts necessary to deliver. This roadmap should include quick wins, tool implementation, process changes, people and organization enablement, communication plans and overall governance changes.
5) Begin project implementation
It’s time for implementation to begin. Create a project plan that defines the actions or tasks, responsibilities and time line for completion of all tasks. Communicate the successes along the way as you achieve each milestone, demonstrating your progress towards your ultimate implementation goal.
Feature Checklist for Incident Management Software
For IT organizations evaluating Incident Management software and/or IT service management suites that offer Incident Management capabilities, it is important to understand the types of features required to support key processes. At a minimum, Incident Management software should provide the following capabilities:
- Create, modify, resolve, and close incident records
- Generate unique record numbers associated with each incident record
- Link incidents to problem records, knowledge articles, known workarounds, and requests for change
- Link configuration management data to incident record
- Notify incident owners when associated problem is resolved
- Automatically record of historical data in an audit log
- Configurable incident categorization
- Incident search and reporting capabilities
- Route incidents based on resource availability, time-zones, sites, etc.
- Prioritize, assign, and escalate incidents based on categorization; escalate based on priority or other categorization
- Integrate with event monitoring solutions with the ability to automatically create, update, and close incident
- Flexible field configurations including, free text, drop down, date/time, attachments, screen captures
- Link incidents to customer data
- Utilize knowledge base solutions/scripts for diagnosis and resolution
- Assign incidents or associated tasks to external service providers
- Assign incidents to multiple assignees
- Create a problem or request for change from an incident record
- Automated incident alerts (to IT staff and/or end-user) based on deadlines, SLAs, closure, and other activity
- Link incident records to SLAs
- Collect feedback from end-users via a customer satisfaction survey
- Initiate an incident on behalf of someone else
- Stop the SLA clock functionality to put an incident on hold
- Differentiate between an incident and a service request
- Reactivate resolved incident
- Prioritize automatically determined by impact and urgency
- Integrate with Telephony/ACD system to pre-populate customer information based on caller ID
You might also be interested in
Incident Management vs. Problem Management – Why it’s Critical You Understand the Difference
Learn the key differences between Incident Management and Problem Management, and how understanding the distinction can help elevate your team's overall IT service delivery.
Ebook 7 min
7 Deadly Sins of ITIL Implementation
Wondering whether ITIL® is still relevant in today's fast-paced digital environment? ITIL holds many timeless truths, but it can be misapplied when taken too literally. Uncover the seven mistakes commonly made with ITIL implementations, and gain guidance on how you can go faster—while still upholding ITIL's key principles.
Essential Guide 5 min
The Essential Guide to ITIL Problem Management
Many organizations suffer needlessly because they don't have effective Problem Management process. Oftentimes, this is because IT teams confuse Problem Management with Incident Management and don't thoroughly understand its relationship to Change Management.