ISO 20000 Service Continuity and Availability Management

ISO 20000 SMS Service Continuity Availability BCP IT Service Management

Overview

Create service continuity and availability management plans per Clause 6.3 of ISO/IEC 20000-1:2011. This skill guides business impact analysis, availability target definition, continuity strategy selection, disaster recovery planning, and testing schedules to ensure services can be maintained or restored within agreed timeframes.

Continuity and availability planning answers the question auditors always ask: “What happens when this service goes down, and how fast can you recover?” Without documented RTO/RPO values, continuity strategies, and tested recovery procedures, organizations fail this clause consistently. This skill walks you through the business impact analysis first, then builds the strategies and testing schedule on top of those findings – ensuring every critical service has a documented recovery path.

For a step-by-step walkthrough, see the ISO 20000 SMS workspace tutorial .

What It Covers

Business impact analysis – criticality classification, RTO, RPO, and MTPD per service
Availability targets – uptime percentages, service hours, planned maintenance windows, single points of failure
Continuity strategies – backup, redundancy, failover, alternative processing, and data replication
Service continuity plan – invocation criteria, escalation, recovery steps, and communication plan
Testing schedule – tabletop exercises, failover tests, full DR tests, and backup restoration verification
Continuity-availability register – structured JSON covering BIA, strategies, and testing schedule

Example Conversation

You: We need a service continuity and availability plan for our IT services. We have 5 services in our catalog and some basic backups, but nothing formally documented.
Agent: I’ll build the continuity and availability plan. I found your service catalog. Let me start with the business impact analysis – for each service, I need to understand what happens if it goes down:
Which services would cause the most business harm if unavailable for more than a few hours?
Do any services have regulatory or contractual uptime requirements?
What recovery capabilities do you currently have? (backups, redundant systems, DR site)
You: ERP and email are most critical – ERP being down stops order processing and finance, email disrupts everything. We have daily backups to a secondary data center and the ERP database replicates to a standby server. No formal DR site though.
Agent: Let me propose a business impact analysis:
Service Criticality RTO RPO Impact if Down
ERP System Critical 2 hrs 1 hr Order processing stops, financial close delayed
Email Service Critical 4 hrs 1 hr Communication disrupted across all departments
File Storage High 8 hrs 4 hrs Document access lost, collaboration impacted
VPN Service High 4 hrs N/A Remote workers unable to access systems
Intranet Portal Medium 1 biz day 24 hrs Internal comms delayed, not business-critical
The ERP has the tightest RTO because order processing revenue impact is immediate. Does this match your business expectations?
You: Yes, that looks right. What continuity strategies do you recommend?
Agent: Based on the BIA and your existing capabilities:
ERP – You already have database replication to a standby. I recommend documenting an automated failover procedure and reducing RPO with hourly transaction log shipping instead of daily backup alone.
Email – Active-passive failover with async replication. If your email is cloud-hosted, verify the provider’s SLA covers your 4-hour RTO.
File Storage – Daily backup with 4-hour restore from secondary data center. Consider incremental backups every 4 hours to meet the RPO.
VPN – Redundant VPN concentrator pair. If one fails, traffic routes to the standby automatically.
Intranet – Daily backup, restore from backup if needed. Low priority given 1 business day RTO.
For testing, I recommend quarterly tabletop exercises, semi-annual failover tests for ERP and email, and monthly backup restoration verification. Let me build the register and validate it…
Result: continuity_plan_validator – PASS. BIA completed for all 5 services, RTO/RPO defined for all critical services, continuity strategies documented, and testing schedule present.

Service	Criticality	RTO	RPO	Impact if Down
ERP System	Critical	2 hrs	1 hr	Order processing stops, financial close delayed
Email Service	Critical	4 hrs	1 hr	Communication disrupted across all departments
File Storage	High	8 hrs	4 hrs	Document access lost, collaboration impacted
VPN Service	High	4 hrs	N/A	Remote workers unable to access systems
Intranet Portal	Medium	1 biz day	24 hrs	Internal comms delayed, not business-critical

Sample Output Excerpt

The skill produces a structured JSON register covering BIA, continuity strategies, and the testing schedule. Here is a representative excerpt.

{
  "bia": [
    {
      "service_id": "SVC-002",
      "service_name": "ERP System",
      "criticality": "critical",
      "rto_hours": 2,
      "rpo_hours": 1,
      "mtpd_hours": 8,
      "business_impact": "Order processing stops, financial close delayed, revenue loss of approx. 50K/hour",
      "dependencies": ["SVC-003 (File Storage)", "Database infrastructure", "Network"]
    },
    {
      "service_id": "SVC-001",
      "service_name": "Email Service",
      "criticality": "critical",
      "rto_hours": 4,
      "rpo_hours": 1,
      "mtpd_hours": 12,
      "business_impact": "Communication disrupted across all departments, customer notifications delayed",
      "dependencies": ["Network", "DNS", "Email provider SaaS"]
    }
  ],
  "continuity_strategies": [
    {
      "service_id": "SVC-002",
      "strategy": "Database replication to standby with automated failover",
      "failover_type": "automated",
      "backup_frequency": "hourly transaction log shipping",
      "recovery_procedure": "Automated failover to standby; manual verification within 30 min",
      "alternative_processing": "Manual order entry via phone queue"
    },
    {
      "service_id": "SVC-001",
      "strategy": "Active-passive failover with async replication to secondary MX",
      "failover_type": "automated",
      "backup_frequency": "hourly",
      "recovery_procedure": "DNS failover to secondary MX; mail queue drains within 1 hour"
    }
  ],
  "testing_schedule": [
    {
      "test_type": "tabletop_exercise",
      "frequency": "quarterly",
      "scope": "All critical and high services",
      "owner": "Service Continuity Manager"
    },
    {
      "test_type": "failover_test",
      "frequency": "semi-annual",
      "scope": "ERP System, Email Service",
      "owner": "Infrastructure Lead"
    },
    {
      "test_type": "backup_restoration",
      "frequency": "monthly",
      "scope": "All services with backup strategy",
      "owner": "Backup Administrator"
    }
  ]
}

Extension Tools

`continuity_plan_validator`

Validates continuity-availability.json against ISO/IEC 20000-1:2011 Clause 6.3 requirements:

Check	What It Does
BIA coverage	All services in scope must have a business impact analysis entry
RTO/RPO defined	Every critical and high-criticality service must have numeric RTO and RPO values
Criticality classification	Each BIA entry must include a criticality level (critical, high, medium, low)
Continuity strategies	Critical services must have a documented continuity strategy
Testing schedule	At least one test type must be defined with frequency, scope, and owner
RTO vs MTPD	Warns if RTO exceeds or equals the maximum tolerable period of disruption

Getting Started

Start by activating the ISO 20000 Service Continuity and Availability Management skill. The agent will check for your service catalog and SLA register, then guide you through the business impact analysis for each service before building strategies and a testing schedule.

Have this information ready:

Your service catalog and SLA register (if already created)
Business impact estimates for each service – what happens operationally and financially if it goes down
Current backup strategy, recovery capabilities, and any existing DR infrastructure
Regulatory or contractual requirements around uptime and recovery
Key personnel responsible for continuity planning and disaster recovery

The BIA results and continuity strategies documented here feed into the service reporting skill for ongoing monitoring and into management review evidence for audit readiness.