Transformer-based HTR for Historical Documents
Phillip Benjamin Str\"obel, Simon Clematide, Martin Volk, Tobias Hodel

TL;DR
This paper demonstrates that the TrOCR transformer framework effectively transcribes historical manuscripts, outperforming state-of-the-art systems like Transkribus, with minimal training data and easy adaptation to multiple Latin-based languages.
Contribution
It shows that TrOCR is a strong, adaptable model for historical handwritten text recognition, surpassing existing methods without requiring baseline information.
Findings
TrOCR outperforms Transkribus in HTR tasks.
TrOCR adapts easily to multiple Latin-based languages.
Minimal training data needed for effective transfer learning.
Abstract
We apply the TrOCR framework to real-world, historical manuscripts and show that TrOCR per se is a strong model, ideal for transfer learning. TrOCR has been trained on English only, but it can adapt to other languages that use the Latin alphabet fairly easily and with little training material. We compare TrOCR against a SOTA HTR framework (Transkribus) and show that it can beat such systems. This finding is essential since Transkribus performs best when it has access to baseline information, which is not needed at all to fine-tune TrOCR.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Softmax · Residual Connection · Position-Wise Feed-Forward Layer · TrOCR
