BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations
Simone Giovannini, Fabio Coppini, Andrea Gemelli, Simone Marinai

TL;DR
BoundingDocs is a comprehensive dataset that combines multiple document AI datasets, reformulates tasks as QA, and includes spatial annotations to enhance training and evaluation of large language models for document understanding.
Contribution
The paper introduces BoundingDocs, a unified dataset with spatial annotations, reformulating document AI tasks as QA to improve model training and evaluation.
Findings
BoundingDocs improves document QA performance with spatial annotations.
Prompting techniques including bounding boxes enhance model accuracy.
The dataset enables better training of large language models for document understanding.
Abstract
We present a unified dataset for document Question-Answering (QA), which is obtained combining several public datasets related to Document AI and visually rich document understanding (VRDU). Our main contribution is twofold: on the one hand we reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task, making it a suitable resource for training and evaluating Large Language Models; on the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box. Using this dataset, we explore the impact of different prompting techniques (that might include bounding box information) on the performance of open-weight models, identifying the most effective approaches for document comprehension.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
