Towards Analysing Invoices and Receipts with Amazon Textract
Sneha Oommen, Gabby Sanchez, Cassandra T. Britto, Di Wang, Jordan Chiou, Maria Spichkova

TL;DR
This paper evaluates AWS Textract's effectiveness in extracting data from diverse receipts, highlighting its strengths, limitations, and proposing strategies to improve accuracy under various conditions.
Contribution
The study provides a detailed analysis of Textract's performance on receipts, identifying key issues and suggesting mitigation strategies for better extraction accuracy.
Findings
Receipts totals are reliably detected by Textract.
Image quality and layout irregularities affect extraction accuracy.
Proposed mitigation strategies improve data extraction robustness.
Abstract
This paper presents an evaluation of the AWS Textract in the context of extracting data from receipts. We analyse Textract functionalities using a dataset that includes receipts of varied formats and conditions. Our analysis provided a qualitative view of Textract strengths and limitations. While the receipts totals were consistently detected, we also observed typical issues and irregularities that were often influenced by image quality and layout. Based on the analysis of the observations, we propose mitigation strategies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship · Handwritten Text Recognition Techniques · Data Visualization and Analytics
