MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning

Prasham Yatinkumar Titiya; Jainil Trivedi; Chitta Baral; Vivek Gupta

arXiv:2505.21771·cs.CV·May 29, 2025

MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning

Prasham Yatinkumar Titiya, Jainil Trivedi, Chitta Baral, Vivek Gupta

PDF

Open Access

TL;DR

MMTBENCH is a comprehensive benchmark with 500 real-world multimodal tables designed to evaluate and advance vision-language models' ability to perform complex reasoning involving visual and tabular data.

Contribution

This paper introduces MMTBENCH, the first large-scale benchmark for complex multimodal table reasoning, highlighting current models' performance gaps and guiding future research.

Findings

01

State-of-the-art models perform poorly on visual-based and multi-step reasoning questions.

02

Significant performance gaps exist across different question and table types.

03

The benchmark reveals the need for better integration of vision and language in models.

Abstract

Multimodal tables those that integrate semi structured data with visual elements such as charts and maps are ubiquitous across real world domains, yet they pose a formidable challenge to current vision language models (VLMs). While Large Language models (LLMs) and VLMs have demonstrated strong capabilities in text and image understanding, their performance on complex, real world multimodal table reasoning remains unexplored. To bridge this gap, we introduce MMTBENCH (Multimodal Table Benchmark), a benchmark consisting of 500 real world multimodal tables drawn from diverse real world sources, with a total of 4021 question answer pairs. MMTBENCH questions cover four question types (Explicit, Implicit, Answer Mention, and Visual Based), five reasoning types (Mathematical, Extrema Identification, Fact Verification, Vision Based, and Others), and eight table types (Single/Multiple Entity,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques