Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Yifan Wang, Xinzhe Ni, Zicheng Lin, Songtao Jiang, Yiyao Yu, Chufan Shi, Lei Wang, Ruihang Chu, Jin Zeng, Yujiu Yang

TL;DR
This paper introduces URSA, a comprehensive framework for enhancing multimodal mathematical reasoning in large language models through process reward models, new datasets, and reinforcement learning techniques, achieving significant performance improvements.
Contribution
The work pioneers the integration of process reward models into multimodal reasoning, introduces new datasets, and develops a novel training framework for improved multimodal mathematical reasoning.
Findings
URSA-8B-PS-GRPO outperforms existing models by 8.4% and 2.7% on average across benchmarks.
Constructed high-quality multimodal reasoning datasets MMathCoT-1M and DualMath-1.1M.
Proposed a new online RL method, PS-GRPO, for multimodal process supervision.
Abstract
Process Reward Models (PRMs) have shown promise in enhancing the mathematical reasoning capabilities of Large Language Models (LLMs) through Test-Time Scaling (TTS). However, their integration into multimodal reasoning remains largely unexplored. In this work, we take the first step toward unlocking the potential of PRMs in multimodal mathematical reasoning. We identify three key challenges: (1) the scarcity of high-quality reasoning data constrains the capabilities of foundation Multimodal Large Language Models (MLLMs), which imposes further limitations on the upper bounds of TTS and reinforcement learning (RL); (2) a lack of automated methods for process labeling within multimodal contexts persists; (3) the employment of process rewards in unimodal RL faces issues like reward hacking, which may extend to multimodal scenarios. To address these issues, we introduce URSA, a three-stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies
