FreePRM: Training Process Reward Models Without Ground Truth Process Labels

Lin Sun; Chuang Liu; Xiaofeng Ma; Tao Yang; Weijia Lu; Ning Wu

arXiv:2506.03570·cs.CL·June 5, 2025

FreePRM: Training Process Reward Models Without Ground Truth Process Labels

Lin Sun, Chuang Liu, Xiaofeng Ma, Tao Yang, Weijia Lu, Ning Wu

PDF

Open Access

TL;DR

FreePRM presents a weakly supervised method for training Process Reward Models without ground-truth step labels, using pseudo labels and Buffer Probability to achieve high performance and reduce annotation costs.

Contribution

It introduces FreePRM, a novel framework that trains PRMs without step-level ground truth, leveraging pseudo labels and noise reduction techniques.

Findings

01

Achieves 53.0% F1 score on ProcessBench, surpassing fully supervised models.

02

Outperforms existing open-source PRMs by 10.9% to 24.6%.

03

Reduces reliance on costly step-level annotations.

Abstract

Recent advancements in Large Language Models (LLMs) have demonstrated that Process Reward Models (PRMs) play a crucial role in enhancing model performance. However, training PRMs typically requires step-level labels, either manually annotated or automatically generated, which can be costly and difficult to obtain at scale. To address this challenge, we introduce FreePRM, a weakly supervised framework for training PRMs without access to ground-truth step-level labels. FreePRM first generates pseudo step-level labels based on the correctness of final outcome, and then employs Buffer Probability to eliminate impact of noise inherent in pseudo labeling. Experimental results show that FreePRM achieves an average F1 score of 53.0% on ProcessBench, outperforming fully supervised PRM trained on Math-Shepherd by +24.1%. Compared to other open-source PRMs, FreePRM outperforms upon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis · Machine Learning in Healthcare · Topic Modeling