# Redaction tool (healthcare case studies)

> Pre-publication PHI-redaction reviewer-aid for Rakenne-generated healthcare artifacts (letters of medical necessity, appeal letters, peer-to-peer briefs, case.json, packet-index, cover-sheet) being prepared as anonymized case studies for external sharing. Surfaces the 18 HIPAA identifiers where heuristics are reliable, lets the LLM triage candidate names, applies reviewed redactions with relative DAY+n date anchoring, and blocks export when any reliably-detectable identifier remains. Not a HIPAA Safe Harbor or Expert Determination de-identification certification — compliance review is still required before publication.



Tags: Healthcare, HIPAA, Compliance, PHI, Redaction


## Example Prompts

- Redact this appeal letter so I can share the case with a peer specialist
- Prepare an anonymized copy of the LOMN for a pilot write-up
- Check whether this case study still contains any PHI before I publish it
- Sanitize the packet-index.md so we can show it to a prospective reference customer

URL: https://rakenne.app/skills/redaction-tool/index.md

Try this skill: https://rakenne.app/a/?skill=redaction-tool



## Overview

Winning a hard prior-auth or overturning a denial is the Rakenne moment a clinic wants to share — with a peer specialist, a consultant, a prospective reference customer, or a pilot write-up. The same moment is also a HIPAA breach waiting to happen: a single patient name, phone number, DOB, or member ID published externally is a reportable incident, and the 18 HIPAA Safe Harbor identifiers have to come out of the artifact before anyone outside the clinic sees it.

This skill is the pre-publication gate for that specific workflow. It runs on finished Rakenne artifacts (LOMN drafts, appeal letters, peer-to-peer briefs, `case.json`, `packet-index.md`, `cover-sheet.md`), scans for the reliably-detectable subset of the 18 HIPAA identifiers, lets the LLM and the human reviewer triage candidate names, applies reviewed redactions to a separate sanitized copy, and re-audits that output. Dates are not dropped — they are anchored to an index date and rewritten as `DAY+n` offsets so the reader can still follow "DAY+0: first visit → DAY+14: prior therapy initiated → DAY+42: discontinuation" without ever seeing a calendar date. The skill is explicitly a reviewer-aid, not a Safe Harbor or Expert Determination certification; the clinic's compliance officer still reviews the output before publication.

## What it covers

- **The 18 HIPAA identifiers (Safe Harbor, 45 CFR 164.514(b)(2)).** Names; all geographic subdivisions smaller than a state (street address, city, county, ZIP); all elements of dates except year; telephone numbers; fax numbers; email addresses; SSNs; MRNs; health-plan beneficiary numbers (member ID); account numbers; certificate and license numbers; vehicle and device identifiers; URLs; IP addresses; biometric identifiers; full-face photographs; and any other unique identifying number, characteristic, or code.
- **Reliable-set regex detection.** Phone, fax, email, SSN, 5-digit ZIP, explicit dates (ISO and US), `DOB:` labels, and number sequences introduced by `MRN`, `Medical Record Number`, `Member ID`, `Subscriber ID`, `Claim #`, `Authorization`, `Prior Auth`, `Reference Number`, `Case Number`, `Certificate Number`, and `Account Number` labels. URLs and IPs are reported as advisory.
- **Candidate-name triage.** Bare proper nouns are not reliably regexable. The scanner uses weak heuristics (`Mr./Ms./Dr. Capitalized Capitalized`, `Patient: …`, `Provider: …`, signature-block patterns) to surface candidates; the LLM and a human reviewer must triage each one. Non-Western names and single-token surnames are routinely missed by the heuristic and must be caught by the reviewer.
- **DAY+n date anchoring.** Every parseable date in the source is rewritten as an offset from a single `date_anchor` (`DAY+0`). A date 14 days later becomes `DAY+14`; 61 days earlier becomes `DAY-61`. Year-only references and free-text durations ("for six months") are preserved as written. Patients 90 or older collapse to the `90+` age bucket per Safe Harbor.
- **Sentinel-token vocabulary.** `[PATIENT]`, `[PATIENT-FAMILY]`, `[PROVIDER-{role}]`, `[STAFF-{role}]`, `[PAYER-REVIEWER]`, `[ID-REDACTED]`, `[CONTACT-REDACTED]`, `[ADDRESS-REDACTED]`, `DAY+n`, `90+`, and `US {region}`. The sentinels are stable across artifacts so downstream publishers and the Rakenne website pipeline recognise a redacted artifact by shape.
- **Source-preserving write.** `redact_phi_apply` refuses to overwrite `source_path` — the sanitized copy is always written to a different workspace path so the original stays available for the clinic's own records and for payer resubmission.
- **Audit gate that blocks publication.** `redact_phi_audit_check` re-runs the scanner over the sanitized output, masks allowlisted sentinels, and returns `isError: true` if any reliable-category identifier (phone, fax, email, SSN, ZIP5, date, DOB, MRN/member/claim/auth/certificate/account-labeled) remains. Calling workflows must not publish while this is true.
- **Preserved clinical substance.** Diagnoses, therapy names/doses/frequencies, ICD-10 / CPT / HCPCS / NDC codes, disease-score values (PASI, UAS7, ODI, DAS28, ECOG, PHQ-9), lab and imaging findings, outcome statements, specialty-society guideline references, payer denial-reason text, and medical-policy section references all stay — a case study that redacts the clinical content is useless.
- **Workspace-scoped paths.** Both `source_path` and `output_path` are resolved against the project workspace root and rejected if they escape it.

## Example conversation

> **You:** The Anthem L4-L5 fusion case we won on appeal is a great pilot write-up. Prep a redacted copy of `drafts/appeal-2026-0412-88174.md` so I can share it with a prospective reference customer.
>
> **Agent:** Confirming the source is a finished artifact — reading `drafts/appeal-2026-0412-88174.md`. It is the rendered appeal letter, audit-log entry was written, `clinical_claim_check_draft` returned `export_ready: true` on 2026-04-21. Not under active revision. External audience is a prospective reference customer — treating as a case-study publication, not an internal share.
>
> **Agent:** Running `redact_phi_scan`. Reliable-category matches: 1 patient name (`Jane D.`), 2 phone numbers (`(208) 555-0144`, `(208) 555-0145`), 1 fax label (`Fax: (877) 278-2163`), 1 provider NPI block, 1 member ID (`YRK8827461`), 1 claim number (`C-9918274461`), 1 payer reference (`CA-2026-0412-88174`), 1 DOB (`1968-05-14`), 7 dates (`2026-04-21`, `2026-04-12`, `2026-03-28`, `2026-03-31`, `2026-02-03`, `2026-03-10`, `2026-01-18`), 1 ZIP5 (`83702` in the provider address, `10940` in the appeals address). Candidate names (weak heuristic): `Priya N. Shah`, `Mountain Ridge Orthopedics` (false positive — not a person). Please triage.
>
> **You:** Redact the patient name, both physician and staff names, the full provider and payer addresses, and all the IDs. Drop the clinic name — this is for a prospective customer. Anchor the dates on `2026-04-12` (the denial date) so DAY+0 is the denial itself.
>
> **Agent:** Building overrides. `Jane D.` → `[PATIENT]`; `Priya N. Shah, MD` → `[PROVIDER-Attending]`; `Mountain Ridge Orthopedics` → `[PROVIDER-Clinic]`; the two (208) phones and the (877) fax → `[CONTACT-REDACTED]`; NPIs, member ID, claim number, and payer reference → `[ID-REDACTED]`; provider street + city + ZIP → `[ADDRESS-REDACTED]`; appeals address → `[ADDRESS-REDACTED]`; DOB `1968-05-14` → age bucket `60s`. Calling `redact_phi_apply` with `source_path=drafts/appeal-2026-0412-88174.md`, `output_path=case-studies/appeal-redacted-2026-0412-88174.md`, `date_anchor=2026-04-12`.
>
> **Agent:** `redact_phi_apply` returned `ok: true`, `redactions_applied: 14`, `dates_rewritten: 7`, `unredacted_flags: []`. Dates anchored: 2026-04-12 → `DAY+0` (denial), 2026-04-21 → `DAY+9` (letter date), 2026-03-28 → `DAY-15` (admission), 2026-03-31 → `DAY-12` (discharge), 2026-03-10 → `DAY-33` (PT eval), 2026-02-03 → `DAY-68` (MRI), 2026-01-18 → `DAY-84` (ESI).
>
> **Agent:** Running `redact_phi_audit_check` on the sanitized file. `isError: true` — one reliable residual remains: a ZIP5 token `83702` in a line I missed because the street address was split across two lines and my override only covered line one. One advisory residual: a URL in the footer (`https://mountainridgeortho.com`) — I flagged that too since it would re-identify the clinic. Candidate names: 0.
>
> **You:** Good catch. Kill the ZIP and the URL.
>
> **Agent:** Added two more overrides (`83702` → `[ADDRESS-REDACTED]`, the URL → `[CONTACT-REDACTED]`), re-ran `redact_phi_apply` with the full override set against the same `source_path` and the same `output_path`, and re-ran `redact_phi_audit_check`. Result: `ok: true`, 0 reliable residuals, 0 advisory residuals, 0 candidate names.
>
> **Agent:** Sanitized file is at `case-studies/appeal-redacted-2026-0412-88174.md`. Reminder: this skill is a reviewer-aid, not a HIPAA Safe Harbor or Expert Determination certification. The clinic's HIPAA compliance officer should review the sanitized artifact before you send it to the prospective customer, and I would generalize the geography further ("US Mountain" or just "US") if the combination of a rare-enough spine-fusion pattern and a specific region could re-identify the patient.

## Sample output excerpt

The skill writes the sanitized artifact to the workspace path you chose and leaves the source untouched. Below: a before/after excerpt from the Anthem L4-L5 appeal letter after the conversation above, plus the JSON returned by `redact_phi_audit_check` on the passing run.

---

**Before** (`drafts/appeal-2026-0412-88174.md`, excerpt):

```markdown
Mountain Ridge Orthopedics
500 Alpine Way, Boise, ID 83702
NPI 1659321488 | Phone (208) 555-0144 | Fax (208) 555-0145

2026-04-21

Anthem Clinical Appeals
PO BOX 5063
Middletown, NY 10940
Fax: (877) 278-2163

**RE: Level 1 internal appeal — medical-necessity + retroactive-authorization**
Member: Jane D. | Member ID: YRK8827461 | DOB: 1968-05-14
Claim: C-9918274461 | Dates of service: 2026-03-28 to 2026-03-31
Denial date: 2026-04-12 | Payer reference: CA-2026-0412-88174

... a right L4-L5 transforaminal epidural steroid injection on
2026-01-18 with less than two weeks of relief. MRI dated 2026-02-03
demonstrates L4-L5 disc extrusion with right S1 nerve-root impingement.

Sincerely,

Priya N. Shah, MD
NPI 1487220913 | Board-certified, Orthopedic Surgery
```

**After** (`case-studies/appeal-redacted-2026-0412-88174.md`, excerpt):

```markdown
[PROVIDER-Clinic]
[ADDRESS-REDACTED]
NPI [ID-REDACTED] | Phone [CONTACT-REDACTED] | Fax [CONTACT-REDACTED]

DAY+9

Anthem Clinical Appeals
[ADDRESS-REDACTED]
Fax: [CONTACT-REDACTED]

**RE: Level 1 internal appeal — medical-necessity + retroactive-authorization**
Member: [PATIENT] | Member ID: [ID-REDACTED] | Age: 60s
Claim: [ID-REDACTED] | Dates of service: DAY-15 to DAY-12
Denial date: DAY+0 | Payer reference: [ID-REDACTED]

... a right L4-L5 transforaminal epidural steroid injection on
DAY-84 with less than two weeks of relief. MRI dated DAY-68
demonstrates L4-L5 disc extrusion with right S1 nerve-root impingement.

Sincerely,

[PROVIDER-Attending]
NPI [ID-REDACTED] | Board-certified, Orthopedic Surgery
```

Audit result on the passing run (from `redact_phi_audit_check`):

```json
{
  "ok": true,
  "reliable_residuals": [],
  "advisory_residuals": [],
  "candidate_names": [],
  "note": "This audit is a reviewer-aid. A passing check does not certify HIPAA Safe Harbor or Expert Determination de-identification; the clinic's compliance officer must review the sanitized artifact before external publication."
}
```

A failing audit (from the first pass, before the ZIP and URL were redacted):

```json
{
  "ok": false,
  "reliable_residuals": [
    { "category": "zip5", "span": [142, 147], "literal": "83702", "suggested_replacement": "US {region} (anonymized geography)" }
  ],
  "advisory_residuals": [
    { "category": "url", "span": [2618, 2650], "literal": "https://mountainridgeortho.com", "suggested_replacement": "[CONTACT-REDACTED]" }
  ],
  "candidate_names": []
}
```

<!-- /excerpt -->

## Extension tools and validations

The skill registers three deterministic tools. None of them draft prose — the LLM triages ambiguous matches, the tools apply reviewed decisions and gate the publication.

### `redact_phi_scan`

- **Inputs:** `text` — the full text of the artifact to scan (plain text or markdown).
- **What it does:** runs the reliable-set regex pass over the input with allowlisted sentinels masked before scanning. Returns one `matches[]` entry per reliable-category hit (`category`, `span: [start, end]`, `literal`, `suggested_replacement`) and a separate `candidate_names[]` list from weak heuristics (title + capitalized, `Patient:` label, `Provider:` / `Signed by:` / `Attending:` labels).
- **Categories detected reliably:** `phone`, `fax`, `email`, `ssn`, `zip5`, `date` (ISO, US short, and long-form), `dob` (label-anchored), `mrn-labeled`, `member-labeled`, `claim-labeled`, `auth-labeled`, `certificate-labeled`, `account-labeled`. Advisory-only: `url`, `ip-address`. Weak heuristic: `candidate-name`.
- **Returns:** `{ reliable_match_count, candidate_name_count, matches[], candidate_names[], note }`. The note explicitly reminds the caller that `candidate_names` require LLM / human triage and that the tool is not a Safe Harbor determination.

### `redact_phi_apply`

- **Inputs:** `source_path` (workspace-relative), `output_path` (workspace-relative, must differ from `source_path`), `overrides[]` (reviewed decisions — each entry is `{ span: [start, end], replacement: "<sentinel>" }`), `date_anchor` (ISO date that becomes `DAY+0`).
- **What it does:** resolves both paths against the workspace root and rejects anything that escapes it. Applies the reviewed overrides to the source text (sorted end-to-start so earlier spans are not invalidated). Then rewrites every parseable date (ISO, US short, long-form) as `DAY+{delta}` relative to the anchor, preserving clinical sequencing. Writes the sanitized text to `output_path` — never to `source_path`.
- **Returns:** `{ ok, path, redactions_applied, dates_rewritten, unredacted_flags }`. `unredacted_flags` is non-empty when an override span is out of range or a date literal is unparseable; both are non-fatal warnings the caller should still resolve.
- **`isError: true`:** on path-traversal violations, when `source_path` does not exist, when `source_path == output_path`, or when `date_anchor` does not parse as an ISO date.

### `redact_phi_audit_check`

- **Inputs:** `text` — the full text of the already-redacted artifact, typically read back from `output_path` after `redact_phi_apply`.
- **What it does:** masks the allowlisted sentinels (`[PATIENT]`, `[PATIENT-FAMILY]`, `[PROVIDER-*]`, `[STAFF-*]`, `[PAYER-REVIEWER]`, `[ID-REDACTED]`, `[CONTACT-REDACTED]`, `[ADDRESS-REDACTED]`, `DAY+n`, `DAY-n`, `90+`) and re-runs the scanner. Partitions residuals into `reliable_residuals[]` (blocking) and `advisory_residuals[]` (non-blocking but reported), plus the `candidate_names[]` list.
- **Returns:** `{ ok, reliable_residuals[], advisory_residuals[], candidate_names[], note }`.
- **`isError: true`:** whenever any reliable-category identifier remains. The calling workflow must not publish the artifact while this is the case.

### Validation invariants

- **Never writes to `source_path`.** The source artifact is always preserved. `output_path == source_path` is rejected before any file I/O.
- **Path-traversal guard.** Both paths must resolve inside the workspace root; anything resolving outside is rejected.
- **Reliable-set audit gate.** Any reliable-category residual in the output returns `isError: true`. Advisory residuals (URLs, IPs without an identifying label) are reported but do not block — the human reviewer decides.
- **Sentinel tokens are the only accepted redactions.** Ad-hoc placeholders (`[REDACTED]`, `[XXX]`, hand-written brackets) are not in the allowlist and will surface as residuals on the audit re-scan. Extending the vocabulary means updating `references/transformation-rules.md` in the same change.
- **Relative-date sequencing, not date removal.** Dropping calendar dates destroys the narrative; the skill rewrites every parseable date to `DAY+n` so a reader can still see the timeline. Year-only references and free-text durations are preserved as written.
- **Candidate names require human triage.** `candidate_names` from the weak heuristic are never applied automatically — they return to the reviewer for accept / modify / reject decisions. Non-Western names and single-token surnames that the heuristic misses must be caught by the reviewer.
- **No Safe Harbor claim.** A passing audit is not a Safe Harbor or Expert Determination certification. The note on every response reminds the caller; the sanitized artifact itself should carry a header noting the same.

## Getting started

This skill is a **compliance gate**, not a drafting skill. Install it and run it at publication time — after the artifact is finished and before it leaves the clinic.

1. Install the skill from the Rakenne library. It contributes three tools (`redact_phi_scan`, `redact_phi_apply`, `redact_phi_audit_check`) and the three reference docs (`phi-categories.md`, `preservation-rules.md`, `transformation-rules.md`) that define the sentinel vocabulary and the preservation list.
2. Confirm the source is a **finished** Rakenne artifact — a rendered LOMN, appeal letter, peer-to-peer brief, `case.json`, `packet-index.md`, or `cover-sheet.md` that has already been through its own validators and export gate. Do not redact drafts still under revision.
3. Confirm the **external audience** with the user — peer clinician, consultant, prospective reference customer, or public write-up. The audience determines how aggressively geography and role labels should be generalized.
4. Ask the user to pick the **date anchor**. Use the earliest clinical event for LOMN / appeal, the denial date for a peer-to-peer brief, and `case.case_opened_at` or the earliest `timeline[].date` for a `case.json` teardown. Document the anchor choice in the sanitized artifact's header so a downstream reader can interpret `DAY+n` in context.
5. Run `redact_phi_scan` and triage every reliable match plus every `candidate_name` with the user. Accept, modify, or reject each one. Names from the weak heuristic must be reviewed by a human.
6. Run `redact_phi_apply` with the reviewed overrides, the source path, a **different** output path, and the date anchor. Resolve any `unredacted_flags` the tool returns.
7. Run `redact_phi_audit_check` on the sanitized output. If it returns `isError: true`, fix the residual, re-run `redact_phi_apply`, and re-audit. Do not publish until the audit is clean.
8. Surface the sanitized file to the user with a reminder that this is a reviewer-aid, not a Safe Harbor or Expert Determination certification. The clinic's HIPAA compliance officer reviews before external publication.

Do not run this skill on packets being submitted to the payer — the payer needs the identifying information to adjudicate the appeal. This skill is for externally shared or published copies only.



---

Back to [Skill Library](https://rakenne.app/skills/index.md)
