MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?

Zhitao He; Zongwei Lyu; Dazhong Chen; Dadi Guo; Yi R. Fung

arXiv:2506.06034·cs.CL·June 9, 2025

MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?

Zhitao He, Zongwei Lyu, Dazhong Chen, Dadi Guo, Yi R. Fung

PDF

Open Access

TL;DR

This paper introduces MATP-BENCH, a comprehensive multimodal theorem proving benchmark that evaluates the ability of Multimodal Large Language Models to solve formal mathematical problems across various levels and languages.

Contribution

The paper presents MATP-BENCH, a new benchmark with 1056 multimodal theorems, formalizations in multiple proof assistants, and a focus on multimodal reasoning for automated theorem proving.

Findings

01

Existing models solve only a limited subset of problems

02

MATP-BENCH reveals significant challenges in multimodal theorem proving

03

Benchmark encourages development of more capable multimodal reasoning models

Abstract

Numerous theorems, such as those in geometry, are often presented in multimodal forms (e.g., diagrams). Humans benefit from visual reasoning in such settings, using diagrams to gain intuition and guide the proof process. Modern Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in solving a wide range of mathematical problems. However, the potential of MLLMs as Automated Theorem Provers (ATPs), specifically in the multimodal domain, remains underexplored. In this paper, we introduce the Multimodal Automated Theorem Proving benchmark (MATP-BENCH), a new Multimodal, Multi-level, and Multi-language benchmark designed to evaluate MLLMs in this role as multimodal automated theorem provers. MATP-BENCH consists of 1056 multimodal theorems drawn from high school, university, and competition-level mathematics. All these multimodal problems are accompanied by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Constraint Satisfaction and Optimization · Model-Driven Software Engineering Techniques