Multimodal Tabular Reasoning with Privileged Structured Information

Jun-Peng Jiang; Yu Xia; Hai-Long Sun; Shiyin Lu; Qing-Guo Chen; Weihua Luo; Kaifu Zhang; De-Chuan Zhan; Han-Jia Ye

arXiv:2506.04088·cs.LG·June 5, 2025

Multimodal Tabular Reasoning with Privileged Structured Information

Jun-Peng Jiang, Yu Xia, Hai-Long Sun, Shiyin Lu, Qing-Guo Chen, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

PDF

Open Access

TL;DR

This paper introduces Turbo, a framework that improves multimodal large language models' ability to perform reasoning over table images by leveraging privileged structured information during training, achieving state-of-the-art results.

Contribution

The paper proposes Turbo, a novel multimodal reasoning framework that uses privileged structured data and a reasoning trace generator to enhance reasoning over table images.

Findings

01

Turbo achieves +7.2% performance over previous SOTA.

02

The framework effectively aligns structured information with visual data.

03

Limited data (9k samples) suffices for high performance.

Abstract

Tabular reasoning involves multi-step information extraction and logical inference over tabular data. While recent advances have leveraged large language models (LLMs) for reasoning over structured tables, such high-quality textual representations are often unavailable in real-world settings, where tables typically appear as images. In this paper, we tackle the task of tabular reasoning from table images, leveraging privileged structured information available during training to enhance multimodal large language models (MLLMs). The key challenges lie in the complexity of accurately aligning structured information with visual representations, and in effectively transferring structured reasoning skills to MLLMs despite the input modality gap. To address these, we introduce TabUlar Reasoning with Bridged infOrmation ({\sc Turbo}), a new framework for multimodal tabular reasoning with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Speech and dialogue systems