The Essential Guide to ITIL Problem Management
Table of Contents
- Forward and Background
- What is ITIL Problem Management?
- The Value of Problem Management to the Business
- Problem Management Process Flow
- Inter-Related ITIL Processes: Incident and Change Management
- Problem Management Roles and Responsibilities
- Problem Management Key Performance Indicators (KPIs)
- Feature Checklist for Problem Management Software
Forward and Background
Understanding Your Level of Organizational Maturity When Implementing ITIL
As IT service desk professionals, we want to deliver and support a service experience for our users that is nothing beyond extraordinary. We can manage incidents and restore service as quickly as possible using the incident management process but the ultimate goal is to have no incidents. So, I give you Problem Management to help you and your organization achieve these outcomes.
The primary goal of Problem Management is to prevent incidents from occurring, and if incidents do occur, prevent them from occurring again. Can you imagine a Service Provider just reacting to incidents that continuously repeat themselves and are never truly resolved? Can you imagine a scenario like this, and it becomes "business as usual" to resolve the same incidents over and over and over? Over time the number of incidents will increase, the cost of managing incidents will increase, customer and user satisfaction will plummet, tne service desk's reputation will suffer, shadow IT initiatives will become the norm, and the collective result will be a detrimental impact on the ability to do business.
Many organizations suffer needlessly because they don't have effective Problem Management process. Oftentimes, this is because IT teams confuse Problem Management with Incident Management and don't thoroughly understand its relationship to Change Management. While these processes work hand in hand, the goal of Problem Management is to support Incident Management by preventing incidents from happening in the first place—through the use of the Change Management process!
As you read through the content in this guide, keep in mind the value to the business of doing what is essential for your organization, and doing it right by leveraging people, process, technology, and suppliers to meet your objectives. Service excellence is a journey that never ends and must be continually practiced. And above all, when you begin seeing improvement in your overall service delivery, celebrate your successes!
Anthony Orr: The Do's and Don'ts of Implementing IT Problem Management
In this brief video, ITIL Author and Examiner, Anthony Orr, shares best practices and common mistakes relating to implementing IT Problem Management within the enterprise.
What is ITIL Problem Management?
Problem Management is an IT service management process tasked with managing the life cycle of underlying "Problems." Success is achieved by quickly detecting and providing solutions or workarounds to Problems in order to minimize impact on the organization and prevent recurrence. Problem Management also attempts to find the error in the IT infrastructure that is causing the Problem and contributing to the Incidents that users may have. The IT Infrastructure Library (ITIL) provides the following definitions for usage within this process:
- Problem: “The cause of one or more Incidents. The cause is not usually known at the time a Problem record is created"
- Error: “A design flaw or malfunction that causes a failure of one or more IT services or other configuration items”
- Known Error: “A Problem that has a documented root cause and workaround”
- Root Cause: “The underlying or original cause of an incident or problem”.
Proactive vs. Reactive Problem Management
Problem Management can be either reactive or proactive.
- Reactive Problem Management is the Problem solving reaction that occurs when one or more Incidents arise.
- Proactive Problem Management deals with identifying and solving Problems before any Incidents have occurred. This activity is associated with Continual Service Improvement (CSI).
The Role of Problem Management within the ITIL Framework
The IT Infrastructure Library (ITIL), launched in 2000, is the most popular IT Service Management (TSM) best practice framework followed worldwide. It has been adopted by individuals and organizations in public and private organizations as a protocol for aligning IT Services with the needs of the business.
ITIL 2011, the most current version, consists of five core publications:
- Service Strategy
- Service Design
- Service Transition
- Service Operation
- Continual Service Improvement
Relationship to ITIL Service Operation
Problem Management is one of five processes that comprises the "Service Operation" publication. ITIL Service Operation is an essential element of the procedural life cycle, focusing on the delivery and support of service, and value to the business, customers, and users. It ensures that agreed upon service levels and quality are achieved or surpassed, providing both an introduction and guidelines to activities that contribute to IT operational excellence.
ITIL Service Operation processes include:
- Problem Management
- Incident Management
- Request Fulfillment
- Event Management
- Access Management
ISO/IEC 20000 Certification
Implementing ITIL Problem Management along with other ITIL processes can help an organization achieve ISO/IEC 20000 certification. In order to become ISO/IEC 20000 certified, a business must demonstrate that they have implemented key IT capabilities and service management processes. ISO/IEC 20000 aligns well with ITIL because while ISO/IEC 20000 describes a set of requirements for an IT Service Management System, ITIL provides a framework for adopting best practices that align with those requirements.
The Value of Problem Management to the Business
The Problem Management process works in conjunction with Incident and Change Management to provide value to the business in a variety of ways. The primary goal of Problem Management is to minimize the impact of Problems on the business and prevent recurrence. When successful, downtime and disruptions are reduced. Additional benefits include:
- Increased service availability
- Improved service quality
- Decreased Problem resolution time
- Reduction of the number of Incidents
- Increased productivity
- Reduced costs
- Improved customer satisfaction
Adopting and implementing ITIL processes and technology will minimize the chaos that IT organizations can face amid the rapidly changing technology landscape. Although Problem Management is its own process, it is dependent on an effective Incident Management process and the proper tools; tools that include a common interface, access to available knowledge, configuration management information and interaction with other related ITIL processes. This ensures that Problems are identified, contain relevant details and are worked on as quickly as possible. ITIL does not provide organizations with an exact method of adopting Problem Management, rather a structured framework that requires adjustment to fit individual business needs and constraints. Regular adjustments to these internal ITIL processes will ultimately support agility, demonstrate business value and help organizations compete in their market space.
Problem Management Process Flow
How does Problem Management work? ITIL Problem Management is about more than just resolving Incidents; it takes into account the entire life cycle of a Problem. The Problem Management life cycle process flow can be structured to manage Problems that are initially reported as Incidents by users or service desk technicians via a self-service portal, over the telephone, via email, in person or Potential Problems that are automatically detected by ITSM personnel or technology before any Incident occurs. The scope of the Problem Management process flow includes:
1) Problem Detection
Problems can be detected in a variety of ways, including as the result of an Incident report, ongoing Incident analysis, and automated detection by an event management tool, or supplier notification. A Problem is commonly detected when the cause of one or more Incidents reported to the service desk is unknown. It is possible that the service desk has resolved the Incident and it may occur again, but they are unsure of the underlying root cause and therefore create a Problem record. In other cases, it may be clear to the service desk that a reported Incident is associated to a Problem. This Problem may have already been recorded – Known Problem – and the Incident can be linked to the existing Problem record. If the Problem has not been recorded then a Problem record should be immediately created to help assure service performance.
2) Problem Logging
In order to maintain a complete historical record, all Problems, regardless of method used to identify and report to the service desk, must by logged with all relevant details, including date/time, user information, description, related Configuration Item from the CMDB, associated Incidents, resolution details and closure information.
- Categorization - Once logged, all appropriate categories must be selected in order to properly assign, escalate and monitor frequencies and Problem trends
- Prioritization - Assigning priority is critical in determining how and when the Problem will be handled by staff. It is determined by the impact - number of associated Incidents which can provide insight into the number of affected users or its impact on the business. In addition, the urgency of the Problem - how quickly resolution is required is taken into account to define the priority
3) Investigation and Diagnosis
An investigation into the root cause of the Problem will take place based on the impact, severity and urgency of the Problem in question. Common investigation techniques include reviewing the Known Error Database (KEDB) in an effort to find matching Problems and resolutions and/or recreating the failure to determine the cause
In some situations it is possible to provide a temporary fix or workaround to the user experiencing the Incident related to the Problem. However, it’s important to seek a permanent change resolution to the underlying error detected by Problem Management
5) Create Known Error Record
Once the investigation and diagnosis is complete, it’s important to create a Known Error record. If future Incidents or Problems arise, the investigating service desk technician will identify and provide resolution more quickly using the known error database (KEDB) and associated workaround(s)
Once resolved, the solution can be implemented using the standard change procedure and tested to confirm service recovery. However, if a normal change was required, an associated Request For Change (RFC) will be raised and approved before a resolution is applied to the Problem
Following confirmation that the Error has been resolved, the Problem and any associated Incidents can be closed. The service desk technician should ensure that the initial classification details are accurate for future reference and reporting
***Major Problem Review - Major Problems are defined by an organization’s business impact analysis (BIA) and risk assessment (RA) to determine response and priority (impact, urgency and severity of the Problem). The goal of a major Problem review is to continually improve the Problem Management process for responding to major business issues. A review process may identify things done correctly, things done incorrectly, what can be improved, additional risks, how to prevent recurrence and the nature of any third-party’s responsibility. This review should not live in a silo; it should be shared with team members as part of training and awareness sessions.
***Problem Control and Error Control – In some situations the terms Problem Control and Error Control may be used during the Problem Management lifecycle. Problem Control can be incorporated into the investigation phase with the goal of finding the root cause of the problem and turning it into a known error. This helps the service desk technician provide temporary workarounds to the user. Error Control on the other hand is part of the resolution phase with the goal of converting known errors into solutions and removing them from the known error database (KEDB) when necessary.
Problem Management Roles and Responsibilities
Well defined roles and responsibilities are critical to the effective execution of a successful Problem Management process. The Problem Management team is made up of the following:
1) Problem Manager
A Problem Manager is a designated person who may or may not be responsible for other organizational roles. This owner of the Problem Management process is responsible for all aspects of its coordination, including:
- Acting as the liaison with personnel responsible for Problem resolution
- Ensuring Problems are resolved within their SLA
- Ownership and management of the Known Error Database (KEDB)
- Closure of Problems
- Coordinating major Problem review
Note: The Problem Manager and Incident Manager should not be he same person because of possible conflicts in execution focus.
2) Problem Solving Team
Solving Problems may be handled by internal technical support team members or external suppliers or vendors. In situations where a serious or major Problem occurs, the Problem Manager may formulate a dedicated Problem Management team that is made up of resources with specific expertise.
Problem Management Key Performance Indicators (KPIs)
Measurements are important across all stages of the ITIL lifestyle in order to quantify overall success. To determine the effectiveness of Problem Management, businesses must identify Objectives, Key Performance Indicators (KPIs) and Critical Success Factors (CSFs). These may be different for each company. A good starting place to determine Problem Management CSFs and KPIs is to identify the current and future improvement objectives of Problem Management. The Objectives should support the Goals, Mission and Vision of the organization for Operational Effectiveness. Objectives, CSFs, and KPIs vary based on maturity of process. Typical Problem Management metrics to consider include:
- Problems reported by (category, organizational unit, person, etc.)
- Problems resolved within SLA targets
- Percentage of Problems exceeding SLA targets
- Trends associated with Problem backlog
- Average cost of managing a Problem
- Root Cause Analysis (RCA) report
Although each organization is different, relevant reports for users, staff and management will help support important decision that can be used to continually improve both the processes and the business as a whole.
Feature Checklist for Problem Management Software
For IT organizations evaluating Problem Management software and/or IT service management suites that offer Problem Management capabilities, the following features are important, if not critical, for effectively supporting key processes.
At a minimum, Problem Management software should enable administrators to:
- Configure problem processes
- Configure incident categorization
- Create, modify, resolve, and close problem records
- Implement ITIL or other industry best practice frameworks
- Automatically update status or close all related incidents upon problem update/closure
- Integrate with incident, change, configuration, knowledge management
- Automate problem creation based on business rules and SLAs
- Document and manage knowledge artifacts associated with problems and known errors
- View impacted CIs from within a problem record
- Track work time
- Link problems to CIs, incidents, and change requests
- Assign impact and urgency to a problem
- Differentiate between problems and known errors
- Automate or manually assign tasks to individuals or teams
- Automate recording of historical data in an audit log
- Link with third party knowledge base
- Use flexible field configurations including, free text, drop down, date/time, attachments, screen captures
- Create templates for recurring problems
- Search for solutions, workarounds, and known errors
- Document root cause analysis
- Generate unique record numbers associated with each problem record
- Problem search and reporting capabilities
Ebook 7 min
7 Deadly Sins of ITIL Implementation
Wondering whether ITIL® is still relevant in today's fast-paced digital environment? ITIL holds many timeless truths, but it can be misapplied when taken too literally. Uncover the seven mistakes commonly made with ITIL implementations, and gain guidance on how you can go faster—while still upholding ITIL's key principles.
White Paper 7 min
ITIL Made Easy: Best Processes and Best Practices
How do you simplify and remain ITIL compliant in today's increasingly dynamic and fast-paced business environment? This white paper details the step-by-step process to achieving ITIL success using the "Golden Triangle" (People, Process, Technology) framework as your guide.
New to ITIL, ITSM, and Lean? Don’t Make the Same Mistakes We Did!
Join George Spalding, EVP, Pink Elephant, 20-year ITIL veteran, and co-author of the ITIL v3 Continual Service Improvement volume as he discusses the lessons learned over two decades of working to improve IT performance.
You might also be interested in
5 Best Practices for Problem Management
Here's how to avoid common pitfalls when implementing Problem Management, one of the most popular and commonly implemented ITIL processes
Incident Management vs. Problem Management – Why it’s Critical You Understand the Difference
Learn the key differences between Incident Management and Problem Management, and how understanding the distinction can help elevate your team's overall IT service delivery.
Essential Guide 5 min
The Essential Guide to ITIL Incident Management
Incident Management is usually the first IT Infrastructure Library (ITIL®) process targeted for implementation or improvement among organizations seeking to adopt ITIL best practices. The reasons for this are simple: Improved Consumerization and Service Value Realization.