Home » ITIL » Introducing ITIL®: Problem Management

Introducing ITIL®: Problem Management

In this series of articles on implementing ITIL® processes, we have so far discussed setting up the Service Desk function and the Incident Management process. Continuing the series, we will take a look at the process that closely complements Incident Management: Problem Management.

Purpose and Definitions

To recall our earlier definition: a Problem is an unknown root cause of one or more Incidents; when the underlying cause is known, it is classified as a Known Error. The goal of Problem Management is to identify, resolve and prevent the root causes of Incidents.

To illustrate the relationship between the two processes again, one can think of Incident Management as firefighting: the goal is to extinguish the fire; a Major Incident requires multiple fire departments to put it out. And Problem Management is then charged with determining the cause of the fire with the goal of preventing future fires.

Responsibilities

The Problem Management process is responsible for minimizing the business impact of Incidents, Problems and Known Errors. This includes resolving existing Problems and Known Errors, and analyzing trends to work proactively to prevent future occurrences. By finding and eliminating the root causes of Incidents, Problem Management helps to reduce overall Incident volume by offering permanent solutions. By providing the Service Desk with insight into Known Errors and workarounds, time to restore service is reduced and end-users experience less interruption.  The Service Desk can also directly resolve more calls and Incidents (first-time resolution rate is often used as a key performance measurement).

When an Incident is resolved (i.e., service has been restored), the Incident Record is closed. The Service Desk can then generate a new Problem Record if the cause of the Incident is unknown or associate the Incident with other Incidents or with a Known Error. As in Incident Management, Problem Management also classifies Problems and Known Error Records in terms of urgency and priority. Those Problems which cause the most Incidents or have the biggest impact are investigated first to determine the root cause. Once the root cause is discovered, a Problem is converted to a Known Error. Next a solution is designed to resolve the root cause. Resolution of Errors is carried out via the Change Management process, which we will discuss in a future article.

Considerations

The success of the Problem Management process depends foremost on a well-implemented Incident Management process. It also depends on other critical factors, such as an up-to-date Configuration Management Database (CMDB) and a knowledge base of Known Errors and workarounds. When first implementing Problem Management, it is likely that most activity, as with Incident Management, will be reactive in nature. For example, initial focus may be on the Top 10 Incidents of the previous week/month. As the process matures, activity can eventually shift to being more proactive in detecting Incident trends and patterns.

It is also important that the Service Desk team is properly trained to progress Incident Records to Problem Records and Known Errors, and to detect trends of Incidents which may indicate Problems. The output of Incident Management is input for Problem Management, so accuracy and quality of information is essential to the success of both. While the Incident and Problem Management processes are complementary, there is also potential for conflict (e.g., Service Desk resources and availability). Because IT serves the business, Incident Management and restoration of service should always take priority.

Benefits and Measuring Performance

Well-implemented Incident Management and Problem Management should lead to improved service to users through reduced Incidents (service interruptions), permanent solutions, and better insight into the IT infrastructure for the entire organization. These processes can drive and benefit directly from continuous improvement.

An important component of both continuous improvement and management reporting is measurement of Incident Management and Problem Management efficiency, accuracy and effectiveness.  Some metrics to consider:

  • Percentage of Incidents resolved first-time (due to Known Error)
  • Average time to restore service
  • Number of Problems resolved versus number of open Problems
  • Percentage of Incidents with correct first-time assignment (inaccurate assignment hinders analysis)
  • Customer Satisfaction (survey per closed call)

In the next article in this Introducing ITIL series, we will look at implementing the Change Management process.

 

About this author:

Angel Prusinowski

Angel is a leading ITIL® instructor at Ashford Global IT.

Leave a Reply

Your email address will not be published. Required fields are marked *

*