Towards Analysing Invoices and Receipts with Amazon Textract

Sneha Oommen; Gabby Sanchez; Cassandra T. Britto; Di Wang; Jordan Chiou; Maria Spichkova

arXiv:2512.19958·cs.IR·December 24, 2025

Towards Analysing Invoices and Receipts with Amazon Textract

Sneha Oommen, Gabby Sanchez, Cassandra T. Britto, Di Wang, Jordan Chiou, Maria Spichkova

PDF

Open Access

TL;DR

This paper evaluates AWS Textract's effectiveness in extracting data from diverse receipts, highlighting its strengths, limitations, and proposing strategies to improve accuracy under various conditions.

Contribution

The study provides a detailed analysis of Textract's performance on receipts, identifying key issues and suggesting mitigation strategies for better extraction accuracy.

Findings

01

Receipts totals are reliably detected by Textract.

02

Image quality and layout irregularities affect extraction accuracy.

03

Proposed mitigation strategies improve data extraction robustness.

Abstract

This paper presents an evaluation of the AWS Textract in the context of extracting data from receipts. We analyse Textract functionalities using a dataset that includes receipts of varied formats and conditions. Our analysis provided a qualitative view of Textract strengths and limitations. While the receipts totals were consistently detected, we also observed typical issues and irregularities that were often influenced by image quality and layout. Based on the analysis of the observations, we propose mitigation strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Humanities and Scholarship · Handwritten Text Recognition Techniques · Data Visualization and Analytics