Overview
Build a problem management process that meets ISO/IEC 20000-1:2011 Clause 8.2. This skill guides you through defining reactive and proactive problem identification methods, establishing root cause analysis techniques, creating the known error database approach, and documenting how problem management integrates with incident and change management.
Problem management is what separates a reactive IT organization from a mature one. While incident management restores service, problem management eliminates root causes so incidents stop recurring. An auditor will expect to see defined RCA techniques, a known error database approach, clear triggers for when incidents become problems, and integration with change management for implementing permanent fixes.
For a step-by-step walkthrough, see the ISO 20000 SMS workspace tutorial .
What It Covers
- Problem identification – reactive (incident trends, major incident reviews) and proactive (monitoring, vendor advisories)
- Problem lifecycle – identification through closure with defined stages
- Root cause analysis techniques – 5 Whys, Ishikawa, Fault Tree Analysis with selection criteria
- Known error database (KEDB) – structure, search procedures, review and retirement criteria
- Incident management integration – trigger thresholds, major incident auto-creation, workaround feedback
- Change management integration – RFC triggers, post-implementation verification
Example Conversation
You: We need to establish a problem management process for our ISO 20000 certification. We currently handle incidents but don’t have a formal way to track root causes or prevent recurring issues.
Agent: That’s a common gap. I’ll help you build a problem management process for Clause 8.2. Let me start with how problems are identified. There are two approaches:
Reactive – problems detected from incident patterns:
- When the same incident type recurs more than a threshold number of times
- After every major incident (P1) review
- When a workaround is applied repeatedly
Proactive – problems detected before they cause incidents:
- Infrastructure monitoring trend analysis
- Vendor security advisories and known issues
- Capacity and performance trend reviews
What threshold should trigger a problem record from repeated incidents? Common values are 3 or 5 occurrences within 30 days.
You: Let’s use 3 occurrences in 30 days. And every P1 incident should automatically generate a problem record.
Agent: Good. Now let’s define your root cause analysis techniques. ISO 20000 doesn’t mandate a specific method, but you need documented techniques with guidance on when to use each:
Technique Best Used When 5 Whys Single-cause problems, quick investigation needed Ishikawa (Fishbone) Multiple potential causes, team brainstorming session Fault Tree Analysis Complex failures with multiple contributing factors Do you want to adopt all three, or focus on a subset?
You: All three. Our L2 engineers can run 5 Whys for straightforward issues, but we want Ishikawa for anything that affected multiple services and Fault Tree for anything that caused a major incident.
Agent: Now let’s establish your known error database (KEDB) approach. When a problem has an identified root cause but no permanent fix yet, it becomes a known error with a documented workaround. The KEDB structure I’d recommend:
- Problem ID – unique identifier
- Symptoms – how the issue manifests (so service desk can match incidents)
- Root cause – what was found during RCA
- Workaround – temporary resolution steps
- Permanent fix status – planned, in progress, or implemented
How often should known errors be reviewed for retirement?
You: Quarterly. And known errors with a permanent fix implemented and verified should be retired automatically.
Agent: I’ll run the
problem_process_validatornow…Result: All 5 checks passed. 8 lifecycle stages defined. 3 RCA techniques documented with selection criteria. KEDB approach present with structure and quarterly review. Incident integration documented (3-occurrence threshold, major incident auto-creation). Change integration documented with RFC requirement and post-implementation review.
Sample Output Excerpt
The skill produces a structured JSON process file and a markdown process document. Here is a representative excerpt from the JSON.
{
"lifecycle_stages": [
"identification", "logging", "categorization", "prioritization",
"investigation", "rca", "resolution", "closure"
],
"identification_methods": {
"reactive": [
"Incident trend analysis (3+ occurrences in 30 days)",
"Major incident (P1) post-review",
"Repeated workaround application"
],
"proactive": [
"Infrastructure monitoring trend analysis",
"Vendor security advisories",
"Capacity and performance trend reviews"
]
},
"rca_techniques": [
{ "name": "5 Whys", "when_to_use": "Single-cause problems requiring quick investigation" },
{ "name": "Ishikawa (Fishbone)", "when_to_use": "Multi-service impact, team brainstorming required" },
{ "name": "Fault Tree Analysis", "when_to_use": "Complex failures from major incidents" }
],
"kedb_approach": {
"structure": ["problem_id", "symptoms", "root_cause", "workaround", "permanent_fix_status"],
"review_frequency": "quarterly",
"retirement_criteria": "Permanent fix implemented and verified"
},
"incident_integration": {
"trigger_threshold": "3 occurrences in 30 days",
"major_incident_auto_create": true,
"workaround_feedback": true
},
"change_integration": {
"rfc_required_when": "Permanent fix requires infrastructure or application change",
"post_implementation_review": true
}
}
Extension Tools
problem_process_validator
Validates problem-process.json for ISO 20000-1 Clause 8.2 completeness. Run after drafting the problem management process.
| Check | What It Does |
|---|---|
| Lifecycle stages | Confirms all required stages are present (identification, logging, investigation, resolution, closure) |
| RCA techniques | Verifies at least one technique is documented with name and selection criteria |
| KEDB approach | Checks that the known error database structure and review frequency are defined |
| Incident integration | Validates trigger threshold, major incident auto-creation, and workaround feedback are documented |
| Change integration | Confirms RFC trigger criteria and post-implementation review are present |
Getting Started
Activate the ISO 20000 Problem Management skill after you have the incident management process in place – problem management builds on incident data and integrates tightly with it. Have your incident categories and escalation structure ready so the agent can align problem identification triggers.
The completed problem process connects incident trends to root cause elimination and feeds permanent fixes into change management, closing the loop on recurring service disruptions.