TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei, Florencio, Cha Zhang, Zhoujun Li, Furu Wei

TL;DR
TrOCR introduces an end-to-end Transformer-based approach for optical character recognition that leverages pre-trained models for improved accuracy across printed, handwritten, and scene text recognition tasks.
Contribution
The paper presents a novel Transformer-based OCR model that simplifies the pipeline by combining image understanding and text generation in a single end-to-end system.
Findings
Outperforms state-of-the-art models on multiple text recognition tasks
Effective use of pre-training with synthetic data improves accuracy
Applicable to printed, handwritten, and scene text recognition
Abstract
Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/trocr-base-handwrittenmodel· 163k dl· ♡ 487163k dl♡ 487
- 🤗microsoft/trocr-small-handwrittenmodel· 12k dl· ♡ 6312k dl♡ 63
- 🤗microsoft/trocr-base-printedmodel· 1.2M dl· ♡ 2041.2M dl♡ 204
- 🤗microsoft/trocr-base-stage1model· 19k dl· ♡ 1619k dl♡ 16
- 🤗microsoft/trocr-large-handwrittenmodel· 341k dl· ♡ 157341k dl♡ 157
- 🤗microsoft/trocr-large-printedmodel· 712k dl· ♡ 179712k dl♡ 179
- 🤗microsoft/trocr-large-stage1model· 1.3k dl· ♡ 261.3k dl♡ 26
- 🤗microsoft/trocr-small-printedmodel· 38k dl· ♡ 4638k dl♡ 46
- 🤗microsoft/trocr-small-stage1model· 11k dl· ♡ 1311k dl♡ 13
- 🤗vukpetar/trocr-small-photomathmodel· 162 dl· ♡ 6162 dl♡ 6
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Natural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · TrOCR · Layer Normalization · Dense Connections · Multi-Head Attention · Softmax · Label Smoothing · Byte Pair Encoding · Dropout
