MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Peijie Wang; Zhong-Zhi Li; Fei Yin; Xin Yang; Dekang Ran; Cheng-Lin Liu

arXiv:2502.20808·cs.AI·August 4, 2025

MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Peijie Wang, Zhong-Zhi Li, Fei Yin, Xin Yang, Dekang Ran, Cheng-Lin Liu

PDF

TL;DR

MV-MATH introduces a new benchmark dataset with multi-visual math problems to evaluate multimodal models' reasoning abilities in realistic, complex visual-text scenarios, revealing significant challenges faced by current models.

Contribution

We created MV-MATH, a comprehensive dataset of multi-visual math problems from real-world scenarios, and evaluated MLLMs' performance, highlighting existing gaps and challenges.

Findings

01

MLLMs perform substantially worse than humans on MV-MATH.

02

Current models struggle with multi-visual reasoning tasks.

03

Error analysis reveals specific reasoning challenges.

Abstract

Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets. However, most existing multimodal math benchmarks are limited to single-visual contexts, which diverges from the multi-visual scenarios commonly encountered in real-world mathematical applications. To address this gap, we introduce MV-MATH: a meticulously curated dataset of 2,009 high-quality mathematical problems. Each problem integrates multiple images interleaved with text, derived from authentic K-12 scenarios, and enriched with detailed annotations. MV-MATH includes multiple-choice, free-form, and multi-step questions, covering 11 subject areas across 3 difficulty levels, and serves as a comprehensive and rigorous benchmark for assessing MLLMs' mathematical reasoning in multi-visual contexts. Through extensive experimentation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.