GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

Jianghangfan Zhang; Yibo Yan; Kening Zheng; Xin Zou; Song Dai; Xuming Hu

arXiv:2508.04088·cs.CL·August 8, 2025

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

Jianghangfan Zhang, Yibo Yan, Kening Zheng, Xin Zou, Song Dai, Xuming Hu

PDF

TL;DR

GM-PRM is a novel generative model that actively collaborates with multimodal language models to improve complex mathematical reasoning by providing detailed analysis and corrections, leading to state-of-the-art results.

Contribution

Introduces GM-PRM, a generative, interpretable process reward model that actively corrects reasoning errors and enhances multimodal mathematical reasoning performance.

Findings

01

Achieves state-of-the-art results on multimodal math benchmarks.

02

Requires only 20K training samples for effective performance.

03

Provides detailed, step-level interpretability and correction capabilities.

Abstract

Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities but often struggle with complex, multi-step mathematical reasoning, where minor errors in visual perception or logical deduction can lead to complete failure. While Process Reward Models (PRMs) offer step-by-step supervision, existing multimodal PRMs are limited to being binary verifiers that can identify but not correct errors, offering little explanatory power. To address these deficiencies, we introduce the Generative Multimodal Process Reward Model (GM-PRM), a novel paradigm that transforms the PRM from a passive judge into an active reasoning collaborator. Instead of a simple scalar score, GM-PRM provides a fine-grained, interpretable analysis of each reasoning step, evaluating its step intent, visual alignment, and logical soundness. More critically, GM-PRM is trained to generate a corrected version of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.