Template and Extraction Tools: How Rakenne Stabilizes AI-Generated Documents

Learn how Rakenne's Template and Extraction pipelines guarantee consistent formatting, protect regulatory language, validate data, and produce auditable documents — even when working with an LLM.

beginner
12 min read
2026-03-03
Files

Author Ricardo Cabral · Founder

When you use a generic chat AI to draft a regulated document — a securities prospectus, an NDA, a compliance filing — the results are unpredictable. The same currency value might appear as “$ 1.500,00” on one page and “$1,500” on another. A legal disclaimer gets subtly reworded. An EIN has a transposed digit that nobody catches until a regulator does.

Rakenne solves this with two built-in pipelines that every skill can use: Template Tools for producing documents and Extraction Tools for reading them. This article explains what they do, why they matter, and what guarantees they give you as a domain expert using Rakenne.

The key insight: separate what the AI does from what it shouldn’t

Rakenne doesn’t ask the AI to write your document from scratch. Instead, it splits the work:

The AI handles	The tools handle
Understanding your instructions	Formatting numbers, dates, and identifiers
Reading source documents and extracting data	Validating data types and required fields
Asking clarifying questions	Rendering the final document from a template
Drafting narrative sections (risk factors, descriptions)	Protecting regulatory text from any changes
Filling in gaps with your guidance	Auditing the output for errors and data leakage

The AI works with you to gather and organize the data. Then deterministic tools — code that runs the same way every time — turn that data into the finished document. The AI never touches the formatting, never rewrites a legal disclaimer, and never decides how to present a number.

What are Template Tools?

Template Tools are a three-step pipeline that turns structured data into a formatted document. Every skill that produces a formal document uses them behind the scenes.

Step 1: Data Validation

Before anything is rendered, the system checks every piece of data against a schema — a set of rules that define what the document requires.

What gets checked:

Required fields — Is the fund name present? Is the EIN filled in? Is the prospectus date set?
Format compliance — EINs are validated with length and a separator validity check algorithm. Dates must be valid calendar dates. Currency values must be non-negative integers. Percentages must be between 0 and 1.
Fill rate — The system reports how many fields are populated vs. how many are needed (e.g., “245 of 289 variables filled — 84.8%”).

If validation fails, the agent tells you exactly what’s wrong and helps you fix it before proceeding. You never get a document with silently wrong data.

Step 2: Rendering

This is where data becomes a document. The rendering engine:

Formats every value according to locale rules. A currency value stored as 150000 (centavos) becomes R$ 1.500,00 in Brazilian Portuguese — always, with the correct decimal separator, thousands separator, and currency symbol. A date stored as 2026-03-03 becomes 3 de marco de 2026 in long format or 03/03/2026 in short format.
Includes or omits optional sections based on whether the data exists. If a field isn’t applicable to your document, the section is cleanly omitted rather than showing an awkward blank.
Marks missing data visibly. Any field that hasn’t been filled yet appears as [PENDING: field_name] in the draft — making gaps impossible to miss.
Locks regulatory text. Passages that must appear verbatim (like CVM disclaimers or statutory language) are rendered from the template exactly as written. The AI cannot rephrase, summarize, or “improve” them.

Supported formatting includes:

Data type	What you provide	What the document shows
Currency	`150000` (centavos)	R$ 1.500,00
Currency with words	`150000`	R$ 1.500,00 (um mil e quinhentos reais)
Date (long)	`2026-03-03`	3 de marco de 2026
Date (short)	`2026-03-03`	03/03/2026
Percentage	`0.015`	1,50%
Brazilian CNPJ	`11222333000181`	11.222.333/0001-81
Brazilian CPF	`12345678909`	123.456.789-09
US EIN	`123456789`	12-3456789
Number	`1500`	1.500

Every skill can also define domain-specific formatters. For example, a Brazilian capital markets skill maps fund type codes to their full legal names: FIDC becomes Fundo de Investimento em Direitos Creditórios, always — not a paraphrase, not an abbreviation.

Step 3: Audit

After rendering, the system runs an automated audit that checks three things:

Immutable zone integrity. Certain passages — regulatory disclaimers, statutory language, standard legal warnings — are marked as “immutable” in the template. The audit computes a cryptographic hash (SHA-256) of each zone and compares it byte-for-byte against the rendered output. If even a single character was altered, the audit flags it. This is how Rakenne guarantees that a Brazilian CVM disclaimer reading “AS COTAS NÃO CONTAM COM GARANTIA DO ADMINISTRADOR…” will appear exactly as written in every document that uses the template.

Change budgets. Each section of the document has a maximum allowed deviation from the template — for example, a risk factors section might allow only 5% change because most of it is standard regulatory language, while a description section might allow 15% because the content varies per fund. If a section exceeded its budget, the audit flags it for your review.

Anti-contamination scanning. When you’re producing a new document based on a previous one (e.g., creating Fund B’s prospectus starting from Fund A’s data), the audit scans the rendered output for any traces of the old document’s data. It checks every display format — if Fund A’s CNPJ 12.345.678/0001-90 appears anywhere in Fund B’s document, in any format (formatted, unformatted, partial), the audit catches it. This prevents accidental data leakage between documents.

What are Extraction Tools?

Extraction Tools are the reverse of Template Tools: instead of turning data into a document, they turn a document into structured data. This is used when you have an existing PDF (like a precedent prospectus or a reference filing) and want to extract its data into a new document.

How extraction works

1. Document ingestion. The system converts your uploaded PDF into a searchable, page-by-page text format where every line has a stable address (page number + line number).

2. Section detection. Using a domain-specific taxonomy (defined per skill), the system automatically identifies sections in the document — cover page, offering characteristics, risk factors, schedule, etc. — by recognizing heading patterns.

3. Data extraction with evidence. The AI reads each section and extracts every relevant data point. But unlike a generic AI that just gives you a value, Rakenne’s extraction records evidence for every extraction: the exact text that was read, the page it came from, and the line numbers. This creates full traceability — you can verify any extracted value back to its source.

4. Automatic normalization. Raw values from the document are automatically converted to canonical formats:

"R$ 500.000.000,00" becomes 50000000000 (centavos) — ready for consistent formatting later
"12.345.678/0001-90" becomes "12345678000190" — validated with check digit verification
"02/03/2026" becomes "2026-03-02" — unambiguous ISO date format

5. Conflict resolution. When the same data point appears in multiple sections of a document (e.g., the fund’s CNPJ on the cover page and in the service providers section), the system resolves conflicts deterministically using section priority, specificity, and validation status — not by asking the AI to guess which one is “right”.

6. Double-check. After extraction, the system runs a sweep using pattern matching to find values the AI may have missed — scanning for currency patterns, date patterns, CNPJ patterns, and more. This catch-net improves fill rates and reduces the need for manual data entry.

7. Traceability report. The final output includes not just the extracted data, but a full traceability report: for every variable, which section it came from, what the original text said, what page and line it was on, whether there were conflicts and how they were resolved, and what the fill rate is across the entire document.

How a typical session works

Here’s what you experience as a user when working with a skill that uses these tools:

1. You start a project and describe what you need. For example: “I need to draft a FIDC prospectus for Fund Alpha” or “I need to extract data from this existing prospectus PDF.”

2. The agent guides you through data collection. It asks structured questions, section by section: fund identity, service providers, offering terms, share structure, risk factors, schedule. If you uploaded a reference PDF, it extracts most of this automatically and asks you to confirm or fill gaps.

3. You see clear progress. The agent reports fill rates: “We have 245 of 289 fields populated (84.8%). The remaining gaps are in risk factors and additional information. Want to provide the market risk description, or should I draft it based on standard language?”

4. The agent validates before rendering. When the data is ready, the agent runs validation. If a CNPJ has a bad check digit, or a required date is missing, or a currency value is negative, you see specific error messages — not a vaguely wrong document.

5. You get a rendered document with guarantees. The output document has consistent formatting throughout, all regulatory text is verbatim, and every [PENDING: ...] marker tells you exactly what’s still missing.

6. Changes go through the pipeline again. When you ask for revisions — “change the offering amount to R$ 750 million” — the agent updates the data and re-runs validation, rendering, and audit. You’re never editing raw text where a formatting mistake could slip through.

7. The audit gives you confidence. Before delivery, the audit confirms: all immutable zones are intact, all change budgets are within limits, and (if applicable) no data from a previous document leaked in. This is your safety net before sending the document forward.

What guarantees do you get?

Guarantee	What it means for you
Consistent formatting	Every currency value, date, percentage, and identifier is formatted the same way, every time. No more “R$ 1.500,00” on page 3 and “R$1500” on page 12.
Data validation	CNPJs are verified with check-digit math, not just eyeballed. Dates must be real calendar dates. Required fields must be present. You catch errors before the document is finalized, not after.
Immutable regulatory text	Disclaimers, statutory language, and standard legal passages are locked in place with cryptographic verification. The AI cannot rephrase, shorten, or “improve” them — they appear exactly as the regulation requires.
Change control	Each section has a deviation budget. If the rendered document changed more than expected from the template, the system flags it. This catches unintended edits or AI hallucinations that slipped into data fields.
Anti-contamination	When reusing data from a previous document, the system scans the output for any leftover values from the old document — in every possible display format. Fund A’s data won’t accidentally appear in Fund B’s filing.
Full traceability	Every extracted value records its source: page, line number, evidence text. Every rendering decision is recorded in a manifest. You can audit any value back to where it came from.
Visible gaps	Missing data shows as `[PENDING: field_name]` instead of being silently omitted. You always know what’s incomplete.
Deterministic output	Same data + same template = identical document. The output doesn’t depend on the AI’s “mood” or prompt phrasing. If you need to re-render next week, you get the exact same result.

Why this is different from “just using ChatGPT”

Concern	Generic AI chat	Rakenne with Template & Extraction Tools
Formatting	The AI decides how to format values; it varies between responses.	Deterministic formatters produce identical output every time, respecting locale rules.
Regulatory text	The AI may paraphrase or “improve” legal language.	Immutable zones are hash-locked. Any alteration is caught by the audit.
Data accuracy	No built-in validation. A CNPJ with a transposed digit passes unnoticed.	Schema validation checks formats, types, ranges, and check digits before rendering.
Traceability	You get a document. You don’t know where data came from.	Every extracted value has a page:line evidence trail. Every rendering decision is in the manifest.
Consistency	Ask the same question twice, get different formatting.	Same data + same template = same document, guaranteed.
Contamination	Copy-paste errors from precedent documents go undetected.	Anti-contamination scan checks all display forms of previous document’s data.
Completeness	Missing sections may be silently omitted.	`[PENDING: ...]` markers make every gap visible. Fill rate percentage shows overall progress.

Real-world example: securities prospectus

The FIDC Prospectus skill (doc-oferta-fidc) is one of Rakenne’s most comprehensive template-driven skills. It produces Brazilian securities offering documents compliant with CVM Resolution 160/2022, Annex D.

Scale: 289 variables organized across 25 groups (cover page, fund identity, service providers, offering terms, share structure, risk factors, schedule, and more) rendered into 17 document sections.

Immutable zones protect: CVM regulatory disclaimers on the cover page, standard investment risk warnings, investor inadequacy notices, offer suspension/cancellation/revocation procedures (verbatim from CVM 160), and document availability notices.

Locale-aware formatting: All values are rendered in pt-BR convention — R$ 1.500.000,00 (um milhao e quinhentos mil reais), 03/03/2026, 11.222.333/0001-81, 1,50%.

Change budgets: Risk factors allow only 5% deviation (mostly regulatory boilerplate). Cover page allows 15% (more variable content per fund). Overall document allows 10%.

Extraction pipeline: When a user uploads an existing prospectus PDF, the system automatically detects all 17 sections using heading patterns, extracts variables with page:line evidence, resolves conflicts when the same value appears in multiple sections, and produces a traceability report showing the provenance of every data point.

The result: a domain expert can produce a 40-page CVM-compliant prospectus through conversation with the agent, with the confidence that every number is formatted correctly, every disclaimer is verbatim, every CNPJ passes check-digit validation, and the entire document is auditable.

Summary

Rakenne’s Template and Extraction Tools exist because document production is too important to leave entirely to an AI’s probabilistic output. They create a clear division of labor:

You bring the domain expertise, make decisions, and provide or approve content.
The AI helps you gather data, extract information from existing documents, draft narrative sections, and navigate the workflow.
The tools handle everything that must be deterministic: formatting, validation, regulatory text protection, auditing, and traceability.

The result is documents that are AI-assisted but tool-verified — the speed of AI with the precision of code.