MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?
Zhitao He, Zongwei Lyu, Dazhong Chen, Dadi Guo, Yi R. Fung

TL;DR
This paper introduces MATP-BENCH, a comprehensive multimodal theorem proving benchmark that evaluates the ability of Multimodal Large Language Models to solve formal mathematical problems across various levels and languages.
Contribution
The paper presents MATP-BENCH, a new benchmark with 1056 multimodal theorems, formalizations in multiple proof assistants, and a focus on multimodal reasoning for automated theorem proving.
Findings
Existing models solve only a limited subset of problems
MATP-BENCH reveals significant challenges in multimodal theorem proving
Benchmark encourages development of more capable multimodal reasoning models
Abstract
Numerous theorems, such as those in geometry, are often presented in multimodal forms (e.g., diagrams). Humans benefit from visual reasoning in such settings, using diagrams to gain intuition and guide the proof process. Modern Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in solving a wide range of mathematical problems. However, the potential of MLLMs as Automated Theorem Provers (ATPs), specifically in the multimodal domain, remains underexplored. In this paper, we introduce the Multimodal Automated Theorem Proving benchmark (MATP-BENCH), a new Multimodal, Multi-level, and Multi-language benchmark designed to evaluate MLLMs in this role as multimodal automated theorem provers. MATP-BENCH consists of 1056 multimodal theorems drawn from high school, university, and competition-level mathematics. All these multimodal problems are accompanied by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Constraint Satisfaction and Optimization · Model-Driven Software Engineering Techniques
