TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity

Zheyuan Yang; Liqiang Shang; Junjie Chen; Xun Yang; Chenglong Xu; Bo Yuan; Chenyuan Jiao; Yaoru Sun; Yilun Zhao

arXiv:2605.05955·cs.CL·May 8, 2026

TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity

Zheyuan Yang, Liqiang Shang, Junjie Chen, Xun Yang, Chenglong Xu, Bo Yuan, Chenyuan Jiao, Yaoru Sun, Yilun Zhao

PDF

TL;DR

TableVista is a new benchmark with 30,000 multimodal table reasoning samples designed to evaluate foundation models' robustness across visual and structural complexities.

Contribution

It introduces a comprehensive, multi-style benchmark for assessing multimodal table reasoning models under diverse visual and structural challenges.

Findings

01

Models are stable across rendering styles but struggle with complex structures.

02

Performance drops significantly in vision-only and structurally complex scenarios.

03

Current models have notable gaps in reasoning consistency with complex, multimodal tables.

Abstract

We introduce TableVista, a comprehensive benchmark for evaluating foundation models in multimodal table reasoning under visual and structural complexity. TableVista consists of 3,000 high-quality table reasoning problems, where each instance is expanded into 10 distinct visual variants through our multi-style rendering and transformation pipeline. This process encompasses diverse scenario styles, robustness perturbations, and vision-only configurations, culminating in 30,000 multimodal samples for a multi-dimensional evaluation. We conduct an extensive evaluation of 29 state-of-the-art open-source and proprietary foundation models on TableVista. Through comprehensive quantitative and qualitative analysis, we find that while evaluated models remain largely stable across diverse rendering styles, they exhibit pronounced performance degradation on complex structural layouts and vision-only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.