# IT Disaster Recovery Plan (ITDRP)

> Draft IT Disaster Recovery Plans with validated technical runbooks for system restoration, following ISO 27031. Sequences recovery steps respecting system dependencies, validates failover logic to prevent dependency deadlocks, and audits the emergency contact matrix for expired vendor contracts and stale contact data.



Tags: Disaster Recovery, Business Continuity, ISO 27031, Infrastructure


## Example Prompts

- Draft an IT disaster recovery plan for our cloud-hosted infrastructure
- Validate our recovery runbook — check that step ordering respects system dependencies
- Audit our DR contact matrix for expired vendor contracts and stale contacts
- Create a technical runbook for restoring our on-prem data centre after a total outage

URL: https://rakenne.app/skills/it-disaster-recovery-plan/index.md

Try this skill: https://rakenne.app/a/?skill=it-disaster-recovery-plan



## Overview

Draft comprehensive IT Disaster Recovery Plans with validated technical runbooks for system restoration, following ISO 27031 (ICT Readiness for Business Continuity). The skill guides the agent through collecting your infrastructure context, mapping system dependencies, sequencing restoration steps, and verifying the entire plan with automated tools that catch dependency deadlocks and stale emergency contacts before a real disaster hits.

## Getting Started

Activate the *IT Disaster Recovery Plan (ITDRP)* skill and describe your infrastructure — critical systems, hosting model, and recovery objectives. The agent will build a step-by-step recovery runbook, populate machine-readable data files for the recovery sequence and contact matrix, and validate both with automated tools.

## Example Conversation

> **You:** We need a DR plan for our SaaS platform. It runs on AWS — core services are the API gateway, Postgres on RDS, Redis cache, and a React frontend on CloudFront. RTO is 4 hours for everything.

> **Agent:** I'll structure the recovery plan around your architecture. First, I need to map the dependency chain. Does the API gateway depend on anything besides Postgres and Redis? And is there an identity layer (SSO, IAM) that other services require?

> **You:** Yes, we use Auth0 for SSO. The API gateway needs Auth0 and Postgres. The frontend needs the API gateway and CloudFront. Redis is used for session caching but the app can start without it.

> **Agent:** Got it. I've built the recovery sequence in `recovery-plan.json` — six steps from network/DNS through to the customer-facing layer. Let me validate the ordering now...

After building the recovery plan data, the agent runs validation:

> **Agent:** I ran `failover_logic_validator` on the recovery plan. Here's the report:
> - **ERROR**: Step 3 (Restore API Gateway) requires Auth0, but Auth0 is only restored in Step 4 — the API gateway can't start without its identity provider
> - **WARNING**: Step 5 (Restore Redis) is marked critical but has no estimated duration
>
> I'll fix the ordering by moving Auth0 restoration before the API gateway. Should Redis stay as critical priority, or can we downgrade it since the app works without it?

## What the Tools Validate

The skill includes two automated validation tools that run against structured JSON data files:

**`failover_logic_validator`** checks recovery step ordering:
- No step requires a system that is only restored by a later step (the core dependency-deadlock check)
- No step both requires and restores the same system (self-reference)
- Every required system is restored by some step (or explicitly documented as surviving)
- Steps sharing the same execution order have no mutual dependencies
- Critical-priority steps include estimated durations per ISO 27031

**`contact_matrix_authenticator`** audits the emergency contact list:
- Vendor support contracts with expired dates
- Placeholder or dummy data (TBD, N/A, test emails, fake phone numbers)
- Missing phone numbers (voice contact is essential for emergency response)
- Missing escalation paths for primary contacts
- Stale verification dates (contacts not re-verified in 6+ months)
- Vendor entries without SLA response times or contract references

## Output Excerpt

The final ITDRP document includes a six-phase technical runbook with validated step ordering, a contact matrix, communication plan, and testing schedule:

```
Phase 4: Execution — System Restoration

Step 1: Restore core network (DNS, DHCP, NTP)
  Restores: core-network, dns | Requires: (none) | Duration: 2h | Team: NetOps

Step 2: Restore identity layer (Auth0 SSO)
  Restores: auth0-sso | Requires: core-network, dns | Duration: 1h | Team: IAM

Step 3: Restore data tier (Postgres RDS, Redis)
  Restores: postgres-rds, redis-cache | Requires: core-network | Duration: 3h | Team: DBA

Step 4: Restore API gateway and app servers
  Restores: api-gateway, app-server | Requires: auth0-sso, postgres-rds | Duration: 1h | Team: Platform

Step 5: Restore customer-facing services (CloudFront, React app)
  Restores: web-frontend, cdn | Requires: api-gateway, dns | Duration: 0.5h | Team: Platform
```

All step dependencies are validated — no step starts before the systems it needs are already running.


---

Back to [Skill Library](https://rakenne.app/skills/index.md)
