Vision Language Model extraction. No templates. No OCR. No regex.
48 documents tested. 25+ vendor formats. $0.008 per document.
Pipeline couldn't process most documents. Timeouts, format mismatches, regex failures on every new vendor.
On the 4 documents that did process, only 40% of fields were correct. Vendor names, dates, PO numbers all wrong.
Three services chained together: DeepSeek OCR, regex parser, schema mapper. Expensive and slow.
OCR alone took 52 seconds before the LLM mapper even started. Total pipeline time was 90+ seconds.
Skip OCR entirely. Send PDF page images directly to the vision model. One API call. Structured JSON back.
Forced tool use guarantees the model returns data matching our exact schema. No parsing. No format validation. The JSON comes back ready to use.
| # | Layer | What It Does | Technology |
|---|---|---|---|
| 1 | VLM Extraction | Schema-constrained image-to-JSON | Claude Haiku 4.5, forced tool use |
| 2 | Validation Engine | Financial math, dates, required fields, cross-field | Pure Python, <10ms |
| 3 | Suspicion Scoring | Weighted signals for uncertain fields | Risk score computation |
| 4 | Decision Engine | Routes to 6 outcomes based on risk | Deterministic routing |
| 5 | Sonnet Re-extraction | Targeted retry with error context | Claude Sonnet 4 |
No single point of failure. Each layer catches what the previous one missed.
Adding a new document type requires only three things. No code changes to the framework.
JSON structure the model must return. Fields, types, required/optional.
Extraction instructions specific to the document type. What to look for, where.
Which fields are required. Financial relationships. Date sequences.
Purchase Order Acknowledgment Invoice (roadmap) Quote (roadmap)
Goal: release as an open-source Python library. Any organization can plug in their schemas and get production-grade extraction.
Early benchmarks (43 docs) showed 97.7%. After iterative improvements and deployment, the final 5 production tests returned 100% accuracy on every field.
| Document | Vendor | Items | Total | Decision | Time |
|---|---|---|---|---|---|
| Order Acknowledgement | HON Company | 2/2 | $17,965.80 | AUTO_APPROVED | 25.2s |
| Order Acknowledgement | AMQ Solutions | 5/5 | $24,893.18 | AUTO_APPROVED | 12.5s |
| Purchase Order (23 items) | Knoll Inc | 23/23 | $8,692.27 | AUTO_APPROVED | 21.5s |
| Order Verification | Human Active Tech | 5/5 | $63,894.98 | AUTO_APPROVED | 10.5s |
| Office ACK (2 pages) | Bernhardt | 6/6 | $95,385.50 | AUTO_APPROVED | 11.8s |
Every vendor. Every PO number. Every total. Every line item. Perfect.
| Outcome | Rate | Human Needed? | What Happens |
|---|---|---|---|
| AUTO_APPROVED | 64% | No | Goes straight through |
| RETRY_WITH_SONNET | 22% | No | Auto-retry with stronger model, transparent |
| AUTO_CORRECTED | 7% | No | Minor fix applied automatically |
| PENDING_CONFIRM | 0% | Confirmation | Human confirms suggestion |
| PENDING_REVIEW | 0% | Full review | Human reviews from scratch |
| BLOCKED | 0% | Investigation | Quarantined for analysis |
93% fully automated. The previous system needed manual review on 60% of documents.
| Volume | Our Framework | Traditional OCR | Savings |
|---|---|---|---|
| 1,000 docs/month | $8 | $50 - $150 | 6-19x |
| 10,000 docs/month | $80 | $500 - $1,500 | 6-19x |
| 100,000 docs/month | $800 | $5,000 - $15,000 | 6-19x |
Claude Haiku 4.5 is optimized for speed and cost. One API call per document. No OCR service. No regex engine. No template maintenance.
~$0.02 per retry. Only 20% of docs need it. Blended cost: $0.012/doc. Still 4-12x cheaper.
| Phase | Documents | Accuracy | Key Win |
|---|---|---|---|
| Regex Parser | 4/15 (73% failed) | 40% | Dead end |
| LLM Mapper | 9/15 | 98% | AI mapping, still needs OCR |
| VLM Phase 1 | 15/15 | 90.6% | Zero failures, skip OCR |
| + Validation | 15/15 | 93.7% | Catches hallucinations |
| Unseen Benchmarks | 43 | 97.7% | 25+ vendors, first encounter |
| Production Deploy | 5 live | 100% | Deployed, verified, perfect |
Deployed. Live. Verified. Ready for any document type.