A breakthrough in optical character recognition promises to eliminate the need for page-by-page processing in document digitization. Researchers have unveiled a method called one-shot long-horizon parsing that can extract text from entire documents in a single pass, regardless of length.
Traditional OCR systems struggle with multi-page documents, often requiring segmentation and sequential analysis that introduces errors and delays. The new approach treats the entire document as a unified entity, leveraging context across pages to improve accuracy.
The Core Breakthrough
The technique, detailed in a recent preprint, introduces a transformer-based architecture designed to handle long sequences of text without losing positional context. Instead of processing each page independently, the model encodes the whole document into a single representation, then decodes it in one forward pass.
This allows the system to maintain semantic coherence across page breaks and handle complex layouts such as tables, footnotes and headers. Initial benchmarks show significant improvements in character error rates compared to leading commercial OCR engines on multi-page documents.
Why This Matters
Enterprise document processing remains a labor-intensive bottleneck. Industries such as legal, healthcare and finance routinely handle contracts, medical records and reports that span dozens or hundreds of pages. Current OCR workflows require manual verification at every step. One-shot parsing could automate these workflows with higher reliability.
For example, a 200-page legal contract could be digitized in seconds rather than minutes, with the system retaining the logical flow between clauses. This directly impacts e-discovery, compliance audits and archival digitization projects. Libraries and government agencies managing historical documents also stand to benefit from faster and more accurate transcription.
The technique also opens the door to real-time document understanding for assistive technologies. Visually impaired users could receive immediate, coherent text-to-speech from lengthy PDFs without page-by-page interruptions.
Broader AI Context
The development aligns with a wider trend in artificial intelligence toward long-context modeling. Just as language models have expanded their context windows, document processing models are now learning to handle entire documents at once. This shift reduces the need for complex pre-processing and enables end-to-end neural pipelines for document understanding.
Future work will likely extend the approach to multilingual documents and handwritten text. If the technique generalizes, it could eventually replace existing OCR platforms in many enterprise settings.
For now, the research demonstrates that one-shot parsing is not only possible but practical, offering a path toward truly seamless document digitization.



