Efficient Domain Adaptation for Text Line Recognition via Decoupled Language Models
Arundhathi Dev, Justin Zhan

TL;DR
This paper introduces a modular OCR framework that decouples visual detection from linguistic correction, enabling resource-efficient domain adaptation with near state-of-the-art accuracy across diverse document types.
Contribution
It presents a lightweight, decoupled approach using pretrained sequence models trained on synthetic noise, significantly reducing computational requirements for domain adaptation.
Findings
T5-Base performs best on modern text with standard vocabulary.
ByT5-Base excels on historical documents by reconstructing archaic spellings.
The approach reduces compute by approximately 95% compared to end-to-end transformers.
Abstract
Optical character recognition remains critical infrastructure for document digitization, yet state-of-the-art performance is often restricted to well-resourced institutions by prohibitive computational barriers. End-to-end transformer architectures achieve strong accuracy but demand hundreds of GPU hours for domain adaptation, limiting accessibility for practitioners and digital humanities scholars. We present a modular detection-and-correction framework that achieves near-SOTA accuracy with single-GPU training. Our approach decouples lightweight visual character detection (domain-agnostic) from domain-specific linguistic correction using pretrained sequence models including T5, ByT5, and BART. By training the correctors entirely on synthetic noise, we enable annotation-free domain adaptation without requiring labeled target images. Evaluating across modern clean handwriting, cursive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
