TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

Tung Sum Thomas Kwok; Xinyu Wang; Xiaofeng Lin; Peng Lu; Chunhe Wang; Changlun Li; Hanwei Wu; Nan Tang; Elisa Kreiss; Guang Cheng

arXiv:2604.03393·cs.AI·April 7, 2026

TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

Tung Sum Thomas Kwok, Xinyu Wang, Xiaofeng Lin, Peng Lu, Chunhe Wang, Changlun Li, Hanwei Wu, Nan Tang, Elisa Kreiss, Guang Cheng

PDF

TL;DR

TABQAWORLD is a training-free multimodal reasoning framework that dynamically switches representations and leverages table metadata to improve multi-turn table question answering accuracy and efficiency.

Contribution

It introduces a joint optimization approach for tabular action representation and estimation, reducing errors and latency in multi-turn table reasoning.

Findings

01

Achieves 4.87% accuracy improvement over baselines.

02

Reduces inference latency by 33.35%.

03

Outperforms static methods in multi-turn reasoning tasks.

Abstract

Multimodal reasoning has emerged as a powerful framework for enhancing reasoning capabilities of reasoning models. While multi-turn table reasoning methods have improved reasoning accuracy through tool use and reward modeling, they rely on fixed text serialization for table state readouts. This introduces representation errors in table encoding that significantly accumulate over multiple turns. Such accumulation is alleviated by tabular grounding methods in the expense of inference compute and cost, rendering real world deployment impractical. To address this, we introduce TABQAWORLD, a table reasoning framework that jointly optimizes tabular action through representation and estimation. For representation, TABQAWORLD employs an action-conditioned multimodal selection policy, which dynamically switches between visual and textual representations to maximize table state readout…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.