Efficient Table Retrieval and Understanding with Multimodal Large Language Models
Zhuoyan Xu, Haoyang Fang, Boran Han, Bonan Min, Bernie Wang, Cuixiong Hu, Shuai Zhang

TL;DR
This paper introduces TabRAG, a framework that combines visual-text retrieval and large language models to identify and reason over relevant table images from large collections, significantly improving accuracy in real-world table understanding tasks.
Contribution
The paper presents a novel retrieval and reasoning framework for large collections of table images using multimodal models, addressing practical challenges in table understanding.
Findings
Outperforms existing methods by 7.0% in retrieval recall.
Achieves 6.1% higher answer accuracy.
Demonstrates effectiveness on a large-scale dataset with 88,161 training samples.
Abstract
Tabular data is frequently captured in image form across a wide range of real-world scenarios such as financial reports, handwritten records, and document scans. These visual representations pose unique challenges for machine understanding, as they combine both structural and visual complexities. While recent advances in Multimodal Large Language Models (MLLMs) show promising results in table understanding, they typically assume the relevant table is readily available. However, a more practical scenario involves identifying and reasoning over relevant tables from large-scale collections to answer user queries. To address this gap, we propose TabRAG, a framework that enables MLLMs to answer queries over large collections of table images. Our approach first retrieves candidate tables using jointly trained visual-text foundation models, then leverages MLLMs to perform fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Data Quality and Management · Image Retrieval and Classification Techniques
