TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering
Tung Sum Thomas Kwok, Xinyu Wang, Xiaofeng Lin, Peng Lu, Chunhe Wang, Changlun Li, Hanwei Wu, Nan Tang, Elisa Kreiss, Guang Cheng

TL;DR
TABQAWORLD is a training-free multimodal reasoning framework that dynamically switches representations and leverages table metadata to improve multi-turn table question answering accuracy and efficiency.
Contribution
It introduces a joint optimization approach for tabular action representation and estimation, reducing errors and latency in multi-turn table reasoning.
Findings
Achieves 4.87% accuracy improvement over baselines.
Reduces inference latency by 33.35%.
Outperforms static methods in multi-turn reasoning tasks.
Abstract
Multimodal reasoning has emerged as a powerful framework for enhancing reasoning capabilities of reasoning models. While multi-turn table reasoning methods have improved reasoning accuracy through tool use and reward modeling, they rely on fixed text serialization for table state readouts. This introduces representation errors in table encoding that significantly accumulate over multiple turns. Such accumulation is alleviated by tabular grounding methods in the expense of inference compute and cost, rendering real world deployment impractical. To address this, we introduce TABQAWORLD, a table reasoning framework that jointly optimizes tabular action through representation and estimation. For representation, TABQAWORLD employs an action-conditioned multimodal selection policy, which dynamically switches between visual and textual representations to maximize table state readout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
