Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation
Filipe Lauar, Valentin Laurent

TL;DR
This paper demonstrates how transfer learning can effectively adapt the TrOCR OCR model to Spanish, outperforming other models, by fine-tuning an English pre-trained model and providing resources for dataset creation and benchmarking.
Contribution
It introduces a transfer learning approach for adapting TrOCR to Spanish, compares two adaptation methods, and offers a resource-efficient dataset creation pipeline and benchmark.
Findings
Fine-tuning English TrOCR on Spanish outperforms language-specific decoder approach.
The proposed dataset creation pipeline is resource-efficient and effective.
The resulting Spanish TrOCR model achieves state-of-the-art performance in open-source OCR for Spanish.
Abstract
This study explores the transfer learning capabilities of the TrOCR architecture to Spanish. TrOCR is a transformer-based Optical Character Recognition (OCR) model renowned for its state-of-the-art performance in English benchmarks. Inspired by Li et al. assertion regarding its adaptability to multilingual text recognition, we investigate two distinct approaches to adapt the model to a new language: integrating an English TrOCR encoder with a language specific decoder and train the model on this specific language, and fine-tuning the English base TrOCR model on a new language data. Due to the scarcity of publicly available datasets, we present a resource-efficient pipeline for creating OCR datasets in any language, along with a comprehensive benchmark of the different image generation methods employed with a focus on Visual Rich Documents (VRDs). Additionally, we offer a comparative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Interpreting and Communication in Healthcare
MethodsSoftmax · Linear Layer · Dense Connections · Multi-Head Attention · Residual Connection · Attention Is All You Need · Layer Normalization · Focus · Position-Wise Feed-Forward Layer · Balanced Selection
