Efficient Domain Adaptation for Text Line Recognition via Decoupled Language Models

Arundhathi Dev; Justin Zhan

arXiv:2603.28028·cs.CV·March 31, 2026

Efficient Domain Adaptation for Text Line Recognition via Decoupled Language Models

Arundhathi Dev, Justin Zhan

PDF

TL;DR

This paper introduces a modular OCR framework that decouples visual detection from linguistic correction, enabling resource-efficient domain adaptation with near state-of-the-art accuracy across diverse document types.

Contribution

It presents a lightweight, decoupled approach using pretrained sequence models trained on synthetic noise, significantly reducing computational requirements for domain adaptation.

Findings

01

T5-Base performs best on modern text with standard vocabulary.

02

ByT5-Base excels on historical documents by reconstructing archaic spellings.

03

The approach reduces compute by approximately 95% compared to end-to-end transformers.

Abstract

Optical character recognition remains critical infrastructure for document digitization, yet state-of-the-art performance is often restricted to well-resourced institutions by prohibitive computational barriers. End-to-end transformer architectures achieve strong accuracy but demand hundreds of GPU hours for domain adaptation, limiting accessibility for practitioners and digital humanities scholars. We present a modular detection-and-correction framework that achieves near-SOTA accuracy with single-GPU training. Our approach decouples lightweight visual character detection (domain-agnostic) from domain-specific linguistic correction using pretrained sequence models including T5, ByT5, and BART. By training the correctors entirely on synthetic noise, we enable annotation-free domain adaptation without requiring labeled target images. Evaluating across modern clean handwriting, cursive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.