Error Patterns in Historical OCR: A Comparative Analysis of TrOCR and a Vision-Language Model
Ari Vesalainen, Eetu M\"akel\"a, Laura Ruotsalainen, and Mikko Tolonen

TL;DR
This paper compares transformer-based OCR systems, TrOCR and Qwen, on historical texts, revealing differences in error patterns, robustness, and fidelity, emphasizing the importance of architecture-aware evaluation for scholarly use.
Contribution
It provides a systematic analysis of OCR error structures in historical texts, highlighting how model architecture influences error types and implications for scholarly digitization.
Findings
Qwen achieves lower CER/WER and is more robust to degraded input.
TrOCR maintains orthographic fidelity but is prone to cascading errors.
Model architecture biases affect error locality and detectability.
Abstract
Optical Character Recognition (OCR) of eighteenth-century printed texts remains challenging due to degraded print quality, archaic glyphs, and non-standardized orthography. Although transformer-based OCR systems and Vision-Language Models (VLMs) achieve strong aggregate accuracy, metrics such as Character Error Rate (CER) and Word Error Rate (WER) provide limited insight into their reliability for scholarly use. We compare a dedicated OCR transformer (TrOCR) and a general-purpose Vision-Language Model (Qwen) on line-level historical English texts using length-weighted accuracy metrics and hypothesis driven error analysis. While Qwen achieves lower CER/WER and greater robustness to degraded input, it exhibits selective linguistic regularization and orthographic normalization that may silently alter historically meaningful forms. TrOCR preserves orthographic fidelity more consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Digital Humanities and Scholarship · Natural Language Processing Techniques
