DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

Qi Cao; Ruiyi Wang; Ruiyi Zhang; Sai Ashish Somayajula; Pengtao Xie

arXiv:2505.20241·cs.LG·November 5, 2025

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

Qi Cao, Ruiyi Wang, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie

PDF

Open Access 1 Video

TL;DR

DreamPRM introduces a domain-reweighted training framework for multimodal Process Reward Models, enhancing their generalization and reasoning capabilities across diverse multimodal tasks by addressing dataset quality imbalance.

Contribution

It proposes a bi-level optimization approach for training multimodal PRMs that prioritizes high-quality data and improves generalization in multimodal reasoning tasks.

Findings

01

DreamPRM outperforms existing methods on multiple benchmarks.

02

Domain reweighting improves PRM accuracy and robustness.

03

Enhanced generalization across diverse multimodal reasoning tasks.

Abstract

Reasoning has substantially improved the performance of large language models (LLMs) on complicated tasks. Central to the current reasoning studies, Process Reward Models (PRMs) offer a fine-grained evaluation of intermediate reasoning steps and guide the reasoning process. However, extending PRMs to multimodal large language models (MLLMs) introduces challenges. Since multimodal reasoning covers a wider range of tasks compared to text-only scenarios, the resulting distribution shift from the training to testing sets is more severe, leading to greater generalization difficulty. Training a reliable multimodal PRM, therefore, demands large and diverse datasets to ensure sufficient coverage. However, current multimodal reasoning datasets suffer from a marked quality imbalance, which degrades PRM performance and highlights the need for an effective data selection strategy. To address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DreamPRM: Domain-reweighted Process Reward Model for Multimodal Reasoning· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications