TableRAG: Million-Token Table Understanding with Language Models
Si-An Chen, Lesly Miculicich, Julian Martin Eisenschlos, Zifeng Wang,, Zilong Wang, Yanfei Chen, Yasuhisa Fujii, Hsuan-Tien Lin, Chen-Yu Lee, Tomas, Pfister

TL;DR
TableRAG introduces a retrieval-augmented framework for large-scale table understanding with language models, improving efficiency and accuracy by focusing on relevant data and reducing prompt size.
Contribution
We propose TableRAG, a novel retrieval-augmented approach that enhances large-scale table understanding by combining query expansion with schema and cell retrieval.
Findings
Achieves state-of-the-art performance on large-scale table understanding benchmarks.
Reduces prompt length and information loss compared to previous methods.
Demonstrates effective retrieval quality with new million-token benchmarks.
Abstract
Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables. However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Data Quality and Management
