Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
Erez Meoded

TL;DR
This paper demonstrates that applying transformer-based models with specialized data augmentation and ensemble techniques significantly improves handwritten text recognition accuracy on 16th-century Latin manuscripts, advancing the field of historical document digitization.
Contribution
The study introduces four novel augmentation methods tailored for historical handwriting and evaluates ensemble learning, achieving state-of-the-art results in HTR for archival Latin manuscripts.
Findings
Best model achieves CER of 1.86 with augmentation
Ensemble approach reduces CER to 1.60, a 42% improvement
Domain-specific augmentations significantly enhance recognition accuracy
Abstract
Historical handwritten text recognition (HTR) is essential for unlocking the cultural and scholarly value of archival documents, yet digitization is often hindered by scarce transcriptions, linguistic variation, and highly diverse handwriting styles. In this study, we apply TrOCR, a state-of-the-art transformer-based HTR model, to 16th-century Latin manuscripts authored by Rudolf Gwalther. We investigate targeted image preprocessing and a broad suite of data augmentation techniques, introducing four novel augmentation methods designed specifically for historical handwriting characteristics. We also evaluate ensemble learning approaches to leverage the complementary strengths of augmentation-trained models. On the Gwalther dataset, our best single-model augmentation (Elastic) achieves a Character Error Rate (CER) of 1.86, while a top-5 voting ensemble achieves a CER of 1.60 -…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
