TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity
Zheyuan Yang, Liqiang Shang, Junjie Chen, Xun Yang, Chenglong Xu, Bo Yuan, Chenyuan Jiao, Yaoru Sun, Yilun Zhao

TL;DR
TableVista is a new benchmark with 30,000 multimodal table reasoning samples designed to evaluate foundation models' robustness across visual and structural complexities.
Contribution
It introduces a comprehensive, multi-style benchmark for assessing multimodal table reasoning models under diverse visual and structural challenges.
Findings
Models are stable across rendering styles but struggle with complex structures.
Performance drops significantly in vision-only and structurally complex scenarios.
Current models have notable gaps in reasoning consistency with complex, multimodal tables.
Abstract
We introduce TableVista, a comprehensive benchmark for evaluating foundation models in multimodal table reasoning under visual and structural complexity. TableVista consists of 3,000 high-quality table reasoning problems, where each instance is expanded into 10 distinct visual variants through our multi-style rendering and transformation pipeline. This process encompasses diverse scenario styles, robustness perturbations, and vision-only configurations, culminating in 30,000 multimodal samples for a multi-dimensional evaluation. We conduct an extensive evaluation of 29 state-of-the-art open-source and proprietary foundation models on TableVista. Through comprehensive quantitative and qualitative analysis, we find that while evaluated models remain largely stable across diverse rendering styles, they exhibit pronounced performance degradation on complex structural layouts and vision-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
