TabRAG: Improving Tabular Document Question Answering for Retrieval Augmented Generation via Structured Representations
Jacob Si, Mike Qu, Michelle Lee, Marek Rei, Yingzhen Li

TL;DR
TabRAG introduces a structured, parsing-based approach to improve question answering on tabular documents by preserving two-dimensional semantics and leveraging vision-language models for better extraction and understanding.
Contribution
It presents a novel framework that combines layout segmentation and hierarchical parsing with in-context learning to enhance tabular document question answering.
Findings
Outperforms existing parsing techniques on multiple benchmarks.
Effective in handling various table styles and formats.
Improves the accuracy of question answering on tabular data.
Abstract
Incorporating external knowledge bases in traditional retrieval-augmented generation (RAG) relies on parsing the document, followed by querying a language model with the parsed information via in-context learning. While effective for text-based documents, question answering on tabular documents often fails to generate plausible responses. Standard parsing techniques lose the two-dimensional structural semantics critical for cell interpretation. In this work, we present TabRAG, a parsing-based RAG framework designed to improve tabular document question answering via structured representations. Our framework consists of layout segmentation that decomposes the document inputs into a series of components, enabling fine-grained extraction. Subsequently, a vision language model parses and extracts the document tables into a hierarchically structured representation. In order to cater various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Multimodal Machine Learning Applications
