Document Intelligence Framework

From 40% Accuracy to 100% on Production

Vision Language Model extraction. No templates. No OCR. No regex.

48 documents tested. 25+ vendor formats. $0.008 per document.

The Problem

Legacy OCR Pipeline Was Broken

73% Failure Rate

Pipeline couldn't process most documents. Timeouts, format mismatches, regex failures on every new vendor.

40% Header Accuracy

On the 4 documents that did process, only 40% of fields were correct. Vendor names, dates, PO numbers all wrong.

$0.05+ Per Document

Three services chained together: DeepSeek OCR, regex parser, schema mapper. Expensive and slow.

52 Second Processing

OCR alone took 52 seconds before the LLM mapper even started. Total pipeline time was 90+ seconds.

The Solution

Direct VLM Extraction

Skip OCR entirely. Send PDF page images directly to the vision model. One API call. Structured JSON back.

PDF
->
Page Images
150 DPI PNG
->
Haiku 4.5
Forced Tool Use
->
Validation
5 Rules <10ms
->
Decision
6 Outcomes
->
Result
Structured JSON

Forced tool use guarantees the model returns data matching our exact schema. No parsing. No format validation. The JSON comes back ready to use.

Architecture

Five Verification Layers

#LayerWhat It DoesTechnology
1VLM ExtractionSchema-constrained image-to-JSONClaude Haiku 4.5, forced tool use
2Validation EngineFinancial math, dates, required fields, cross-fieldPure Python, <10ms
3Suspicion ScoringWeighted signals for uncertain fieldsRisk score computation
4Decision EngineRoutes to 6 outcomes based on riskDeterministic routing
5Sonnet Re-extractionTargeted retry with error contextClaude Sonnet 4

No single point of failure. Each layer catches what the previous one missed.

Design

Adapter Pattern for Any Document Type

Adding a new document type requires only three things. No code changes to the framework.

1. Schema Definition

JSON structure the model must return. Fields, types, required/optional.

2. System Prompt

Extraction instructions specific to the document type. What to look for, where.

3. Validation Rules

Which fields are required. Financial relationships. Date sequences.

Current Adapters

Purchase Order Acknowledgment Invoice (roadmap) Quote (roadmap)

Goal: release as an open-source Python library. Any organization can plug in their schemas and get production-grade extraction.

Production Results

48 Documents. 25+ Vendors. 100% on Production.

100%
Production Accuracy
41/41
Line Items Recovered
$0.008
Per Document
16s
Avg Processing

Early benchmarks (43 docs) showed 97.7%. After iterative improvements and deployment, the final 5 production tests returned 100% accuracy on every field.

Live Endpoint Verification

5 Documents on Production

DocumentVendorItemsTotalDecisionTime
Order AcknowledgementHON Company2/2$17,965.80AUTO_APPROVED25.2s
Order AcknowledgementAMQ Solutions5/5$24,893.18AUTO_APPROVED12.5s
Purchase Order (23 items)Knoll Inc23/23$8,692.27AUTO_APPROVED21.5s
Order VerificationHuman Active Tech5/5$63,894.98AUTO_APPROVED10.5s
Office ACK (2 pages)Bernhardt6/6$95,385.50AUTO_APPROVED11.8s

Every vendor. Every PO number. Every total. Every line item. Perfect.

Automation

93% Zero Human Touch

OutcomeRateHuman Needed?What Happens
AUTO_APPROVED64%NoGoes straight through
RETRY_WITH_SONNET22%NoAuto-retry with stronger model, transparent
AUTO_CORRECTED7%NoMinor fix applied automatically
PENDING_CONFIRM0%ConfirmationHuman confirms suggestion
PENDING_REVIEW0%Full reviewHuman reviews from scratch
BLOCKED0%InvestigationQuarantined for analysis

93% fully automated. The previous system needed manual review on 60% of documents.

Economics

6-19x Cheaper Than Traditional OCR

VolumeOur FrameworkTraditional OCRSavings
1,000 docs/month$8$50 - $1506-19x
10,000 docs/month$80$500 - $1,5006-19x
100,000 docs/month$800$5,000 - $15,0006-19x

Why So Cheap?

Claude Haiku 4.5 is optimized for speed and cost. One API call per document. No OCR service. No regex engine. No template maintenance.

Sonnet Retry Cost

~$0.02 per retry. Only 20% of docs need it. Blended cost: $0.012/doc. Still 4-12x cheaper.

The Journey

From Broken to State of the Art

PhaseDocumentsAccuracyKey Win
Regex Parser4/15 (73% failed)40%Dead end
LLM Mapper9/1598%AI mapping, still needs OCR
VLM Phase 115/1590.6%Zero failures, skip OCR
+ Validation15/1593.7%Catches hallucinations
Unseen Benchmarks4397.7%25+ vendors, first encounter
Production Deploy5 live100%Deployed, verified, perfect

Roadmap

Open-Source Python Library

from agentic_doc_intel import DocumentIntelligence, Adapter engine = DocumentIntelligence( model="claude-haiku-4-5", retry_model="claude-sonnet-4", ) engine.register_adapter(Adapter.from_schema("purchase_order", po_schema)) engine.register_adapter(Adapter.from_schema("invoice", inv_schema)) result = engine.process("document.pdf", doc_type="invoice") # result.decision = "AUTO_APPROVED" # result.confidence = 0.95 # result.vendor_name = "Acme Corp"

Shipped

  • VLM extraction engine
  • 5-layer validation
  • Decision routing (6 outcomes)
  • PO + ACK adapters
  • Production deployment

Next

  • Python package release (Q3)
  • Invoice + Quote adapters
  • Multi-model backends
  • Batch processing API
  • Self-hosted option

Summary

Production-Grade
Document Intelligence

100%
Accuracy
$0.008
Per Doc
16s
Processing
93%
Automated

Deployed. Live. Verified. Ready for any document type.

1 / 12