AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification
Abdelrahman Abdallah, Mahmoud Abdalla, Mohamed Elkasaby, Yasser, Elbendary, Adam Jatowt

TL;DR
AMuRD is a comprehensive multilingual dataset with detailed annotations for receipts, enabling improved information extraction and classification, demonstrated by high accuracy and F1 scores using fine-tuned language models.
Contribution
The paper introduces AMuRD, a large annotated dataset for receipt analysis, and evaluates language models, achieving state-of-the-art performance in key information extraction and classification.
Findings
F1 score of 97.43% in information extraction
Accuracy of 94.99% in classification
High performance with fine-tuned LLaMA models
Abstract
The extraction of key information from receipts is a complex task that involves the recognition and extraction of text from scanned receipts. This process is crucial as it enables the retrieval of essential content and organizing it into structured documents for easy access and analysis. In this paper, we present AMuRD, a novel multilingual human-annotated dataset specifically designed for information extraction from receipts. This dataset comprises samples and addresses the key challenges in information extraction and item classification - the two critical aspects of data analysis in the retail industry. Each sample includes annotations for item names and attributes such as price, brand, and more. This detailed annotation facilitates a comprehensive understanding of each item on the receipt. Furthermore, the dataset provides classification into distinct product…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Handwritten Text Recognition Techniques
MethodsLLaMA
