IT Disaster Recovery Plan (ITDRP)
Get Started with This SkillOverview
Draft comprehensive IT Disaster Recovery Plans with validated technical runbooks for system restoration, following ISO 27031 (ICT Readiness for Business Continuity). The skill guides the agent through collecting your infrastructure context, mapping system dependencies, sequencing restoration steps, and verifying the entire plan with automated tools that catch dependency deadlocks and stale emergency contacts before a real disaster hits.
Getting Started
Activate the IT Disaster Recovery Plan (ITDRP) skill and describe your infrastructure — critical systems, hosting model, and recovery objectives. The agent will build a step-by-step recovery runbook, populate machine-readable data files for the recovery sequence and contact matrix, and validate both with automated tools.
Example Conversation
You: We need a DR plan for our SaaS platform. It runs on AWS — core services are the API gateway, Postgres on RDS, Redis cache, and a React frontend on CloudFront. RTO is 4 hours for everything.
Agent: I’ll structure the recovery plan around your architecture. First, I need to map the dependency chain. Does the API gateway depend on anything besides Postgres and Redis? And is there an identity layer (SSO, IAM) that other services require?
You: Yes, we use Auth0 for SSO. The API gateway needs Auth0 and Postgres. The frontend needs the API gateway and CloudFront. Redis is used for session caching but the app can start without it.
Agent: Got it. I’ve built the recovery sequence in
recovery-plan.json— six steps from network/DNS through to the customer-facing layer. Let me validate the ordering now…
After building the recovery plan data, the agent runs validation:
Agent: I ran
failover_logic_validatoron the recovery plan. Here’s the report:
- ERROR: Step 3 (Restore API Gateway) requires Auth0, but Auth0 is only restored in Step 4 — the API gateway can’t start without its identity provider
- WARNING: Step 5 (Restore Redis) is marked critical but has no estimated duration
I’ll fix the ordering by moving Auth0 restoration before the API gateway. Should Redis stay as critical priority, or can we downgrade it since the app works without it?
What the Tools Validate
The skill includes two automated validation tools that run against structured JSON data files:
failover_logic_validator checks recovery step ordering:
- No step requires a system that is only restored by a later step (the core dependency-deadlock check)
- No step both requires and restores the same system (self-reference)
- Every required system is restored by some step (or explicitly documented as surviving)
- Steps sharing the same execution order have no mutual dependencies
- Critical-priority steps include estimated durations per ISO 27031
contact_matrix_authenticator audits the emergency contact list:
- Vendor support contracts with expired dates
- Placeholder or dummy data (TBD, N/A, test emails, fake phone numbers)
- Missing phone numbers (voice contact is essential for emergency response)
- Missing escalation paths for primary contacts
- Stale verification dates (contacts not re-verified in 6+ months)
- Vendor entries without SLA response times or contract references
Output Excerpt
The final ITDRP document includes a six-phase technical runbook with validated step ordering, a contact matrix, communication plan, and testing schedule:
Phase 4: Execution — System Restoration
Step 1: Restore core network (DNS, DHCP, NTP)
Restores: core-network, dns | Requires: (none) | Duration: 2h | Team: NetOps
Step 2: Restore identity layer (Auth0 SSO)
Restores: auth0-sso | Requires: core-network, dns | Duration: 1h | Team: IAM
Step 3: Restore data tier (Postgres RDS, Redis)
Restores: postgres-rds, redis-cache | Requires: core-network | Duration: 3h | Team: DBA
Step 4: Restore API gateway and app servers
Restores: api-gateway, app-server | Requires: auth0-sso, postgres-rds | Duration: 1h | Team: Platform
Step 5: Restore customer-facing services (CloudFront, React app)
Restores: web-frontend, cdn | Requires: api-gateway, dns | Duration: 0.5h | Team: Platform
All step dependencies are validated — no step starts before the systems it needs are already running.