Transformer-based HTR for Historical Documents

Phillip Benjamin Str\"obel; Simon Clematide; Martin Volk; Tobias Hodel

arXiv:2203.11008·cs.CV·March 22, 2022·5 cites

Transformer-based HTR for Historical Documents

Phillip Benjamin Str\"obel, Simon Clematide, Martin Volk, Tobias Hodel

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that the TrOCR transformer framework effectively transcribes historical manuscripts, outperforming state-of-the-art systems like Transkribus, with minimal training data and easy adaptation to multiple Latin-based languages.

Contribution

It shows that TrOCR is a strong, adaptable model for historical handwritten text recognition, surpassing existing methods without requiring baseline information.

Findings

01

TrOCR outperforms Transkribus in HTR tasks.

02

TrOCR adapts easily to multiple Latin-based languages.

03

Minimal training data needed for effective transfer learning.

Abstract

We apply the TrOCR framework to real-world, historical manuscripts and show that TrOCR per se is a strong model, ideal for transfer learning. TrOCR has been trained on English only, but it can adapt to other languages that use the Latin alphabet fairly easily and with little training material. We compare TrOCR against a SOTA HTR framework (Transkribus) and show that it can beat such systems. This finding is essential since Transkribus performs best when it has access to baseline information, which is not needed at all to fine-tune TrOCR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EriCongMa/awesome-transformer-ocr
paddle

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Softmax · Residual Connection · Position-Wise Feed-Forward Layer · TrOCR