Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned   Receipt Images

Hongkuan Zhang; Edward Whittaker; Ikuo Kitagishi

arXiv:2212.05525·cs.CL·October 17, 2023

Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Hongkuan Zhang, Edward Whittaker, Ikuo Kitagishi

PDF

Open Access

TL;DR

This paper introduces a localization-free, document-level OCR model for receipts that transcribes entire images into ordered text sequences, eliminating the need for separate text localization steps.

Contribution

The authors adapt and fine-tune a pretrained instance-level OCR model, TrOCR, to recognize full-page receipt images without explicit text localization, improving accuracy and practicality.

Findings

01

Achieved 87.8 F1-score and 4.98% CER on receipt images.

02

Outperformed baseline with 48.5 F1-score and 50.6% CER.

03

Generated text sequences in reading order suitable for real-world use.

Abstract

Digitization of scanned receipts aims to extract text from receipt images and save it into structured documents. This is usually split into two sub-tasks: text localization and optical character recognition (OCR). Most existing OCR models only focus on the cropped text instance images, which require the bounding box information provided by a text region detection model. Introducing an additional detector to identify the text instance images in advance adds complexity, however instance-level OCR models have very low accuracy when processing the whole image for the document-level OCR, such as receipt images containing multiple text lines arranged in various layouts. To this end, we propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained instance-level model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection · Softmax · Layer Normalization · TrOCR