Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images
Hongkuan Zhang, Edward Whittaker, Ikuo Kitagishi

TL;DR
This paper introduces a localization-free, document-level OCR model for receipts that transcribes entire images into ordered text sequences, eliminating the need for separate text localization steps.
Contribution
The authors adapt and fine-tune a pretrained instance-level OCR model, TrOCR, to recognize full-page receipt images without explicit text localization, improving accuracy and practicality.
Findings
Achieved 87.8 F1-score and 4.98% CER on receipt images.
Outperformed baseline with 48.5 F1-score and 50.6% CER.
Generated text sequences in reading order suitable for real-world use.
Abstract
Digitization of scanned receipts aims to extract text from receipt images and save it into structured documents. This is usually split into two sub-tasks: text localization and optical character recognition (OCR). Most existing OCR models only focus on the cropped text instance images, which require the bounding box information provided by a text region detection model. Introducing an additional detector to identify the text instance images in advance adds complexity, however instance-level OCR models have very low accuracy when processing the whole image for the document-level OCR, such as receipt images containing multiple text lines arranged in various layouts. To this end, we propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained instance-level model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection · Softmax · Layer Normalization · TrOCR
