ISO 20000 Problem Management

ISO 20000 SMS Problem Management Root Cause Analysis IT Service Management ITSM

Overview

Build a problem management process that meets ISO/IEC 20000-1:2011 Clause 8.2. This skill guides you through defining reactive and proactive problem identification methods, establishing root cause analysis techniques, creating the known error database approach, and documenting how problem management integrates with incident and change management.

Problem management is what separates a reactive IT organization from a mature one. While incident management restores service, problem management eliminates root causes so incidents stop recurring. An auditor will expect to see defined RCA techniques, a known error database approach, clear triggers for when incidents become problems, and integration with change management for implementing permanent fixes.

For a step-by-step walkthrough, see the ISO 20000 SMS workspace tutorial .

What It Covers

Problem identification – reactive (incident trends, major incident reviews) and proactive (monitoring, vendor advisories)
Problem lifecycle – identification through closure with defined stages
Root cause analysis techniques – 5 Whys, Ishikawa, Fault Tree Analysis with selection criteria
Known error database (KEDB) – structure, search procedures, review and retirement criteria
Incident management integration – trigger thresholds, major incident auto-creation, workaround feedback
Change management integration – RFC triggers, post-implementation verification

Example Conversation

You: We need to establish a problem management process for our ISO 20000 certification. We currently handle incidents but don’t have a formal way to track root causes or prevent recurring issues.
Agent: That’s a common gap. I’ll help you build a problem management process for Clause 8.2. Let me start with how problems are identified. There are two approaches:
Reactive – problems detected from incident patterns:
When the same incident type recurs more than a threshold number of times
After every major incident (P1) review
When a workaround is applied repeatedly
Proactive – problems detected before they cause incidents:
Infrastructure monitoring trend analysis
Vendor security advisories and known issues
Capacity and performance trend reviews
What threshold should trigger a problem record from repeated incidents? Common values are 3 or 5 occurrences within 30 days.
You: Let’s use 3 occurrences in 30 days. And every P1 incident should automatically generate a problem record.
Agent: Good. Now let’s define your root cause analysis techniques. ISO 20000 doesn’t mandate a specific method, but you need documented techniques with guidance on when to use each:
Technique Best Used When
5 Whys Single-cause problems, quick investigation needed
Ishikawa (Fishbone) Multiple potential causes, team brainstorming session
Fault Tree Analysis Complex failures with multiple contributing factors
Do you want to adopt all three, or focus on a subset?
You: All three. Our L2 engineers can run 5 Whys for straightforward issues, but we want Ishikawa for anything that affected multiple services and Fault Tree for anything that caused a major incident.
Agent: Now let’s establish your known error database (KEDB) approach. When a problem has an identified root cause but no permanent fix yet, it becomes a known error with a documented workaround. The KEDB structure I’d recommend:
Problem ID – unique identifier
Symptoms – how the issue manifests (so service desk can match incidents)
Root cause – what was found during RCA
Workaround – temporary resolution steps
Permanent fix status – planned, in progress, or implemented
How often should known errors be reviewed for retirement?
You: Quarterly. And known errors with a permanent fix implemented and verified should be retired automatically.
Agent: I’ll run the problem_process_validator now…
Result: All 5 checks passed. 8 lifecycle stages defined. 3 RCA techniques documented with selection criteria. KEDB approach present with structure and quarterly review. Incident integration documented (3-occurrence threshold, major incident auto-creation). Change integration documented with RFC requirement and post-implementation review.

Technique	Best Used When
5 Whys	Single-cause problems, quick investigation needed
Ishikawa (Fishbone)	Multiple potential causes, team brainstorming session
Fault Tree Analysis	Complex failures with multiple contributing factors

Sample Output Excerpt

The skill produces a structured JSON process file and a markdown process document. Here is a representative excerpt from the JSON.

{
  "lifecycle_stages": [
    "identification", "logging", "categorization", "prioritization",
    "investigation", "rca", "resolution", "closure"
  ],
  "identification_methods": {
    "reactive": [
      "Incident trend analysis (3+ occurrences in 30 days)",
      "Major incident (P1) post-review",
      "Repeated workaround application"
    ],
    "proactive": [
      "Infrastructure monitoring trend analysis",
      "Vendor security advisories",
      "Capacity and performance trend reviews"
    ]
  },
  "rca_techniques": [
    { "name": "5 Whys", "when_to_use": "Single-cause problems requiring quick investigation" },
    { "name": "Ishikawa (Fishbone)", "when_to_use": "Multi-service impact, team brainstorming required" },
    { "name": "Fault Tree Analysis", "when_to_use": "Complex failures from major incidents" }
  ],
  "kedb_approach": {
    "structure": ["problem_id", "symptoms", "root_cause", "workaround", "permanent_fix_status"],
    "review_frequency": "quarterly",
    "retirement_criteria": "Permanent fix implemented and verified"
  },
  "incident_integration": {
    "trigger_threshold": "3 occurrences in 30 days",
    "major_incident_auto_create": true,
    "workaround_feedback": true
  },
  "change_integration": {
    "rfc_required_when": "Permanent fix requires infrastructure or application change",
    "post_implementation_review": true
  }
}

Extension Tools

`problem_process_validator`

Validates problem-process.json for ISO 20000-1 Clause 8.2 completeness. Run after drafting the problem management process.

Check	What It Does
Lifecycle stages	Confirms all required stages are present (identification, logging, investigation, resolution, closure)
RCA techniques	Verifies at least one technique is documented with name and selection criteria
KEDB approach	Checks that the known error database structure and review frequency are defined
Incident integration	Validates trigger threshold, major incident auto-creation, and workaround feedback are documented
Change integration	Confirms RFC trigger criteria and post-implementation review are present

Getting Started

Activate the ISO 20000 Problem Management skill after you have the incident management process in place – problem management builds on incident data and integrates tightly with it. Have your incident categories and escalation structure ready so the agent can align problem identification triggers.

The completed problem process connects incident trends to root cause elimination and feeds permanent fixes into change management, closing the loop on recurring service disruptions.