TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning
Xiaohan Yu, Pu Jian, Chong Chen

TL;DR
TableRAG introduces an SQL-based framework that enhances reasoning over heterogeneous documents containing both text and tables, overcoming limitations of previous flattening methods and achieving state-of-the-art results.
Contribution
The paper presents a novel SQL-based retrieval-augmented generation framework specifically designed for multi-modal, heterogeneous documents, along with a new benchmark for evaluation.
Findings
Outperforms existing baselines on public datasets
Sets new state-of-the-art on HeteQA benchmark
Effectively handles multi-hop reasoning over heterogeneous data
Abstract
Retrieval-Augmented Generation (RAG) has demonstrated considerable effectiveness in open-domain question answering. However, when applied to heterogeneous documents, comprising both textual and tabular components, existing RAG approaches exhibit critical limitations. The prevailing practice of flattening tables and chunking strategies disrupts the intrinsic tabular structure, leads to information loss, and undermines the reasoning capabilities of LLMs in multi-hop, global queries. To address these challenges, we propose TableRAG, an SQL-based framework that unifies textual understanding and complex manipulations over tabular data. TableRAG iteratively operates in four steps: context-sensitive query decomposition, text retrieval, SQL programming and execution, and compositional intermediate answer generation. We also develop HeteQA, a novel benchmark designed to evaluate the multi-hop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior
MethodsLayer Normalization · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Byte Pair Encoding · Softmax · Linear Layer · Dropout · Dense Connections · Attention Is All You Need
