Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob, Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, Aviral Kumar

TL;DR
This paper introduces process reward models (PRMs) for large language models, which provide step-level feedback based on progress, leading to improved reasoning, exploration, and efficiency compared to outcome reward models (ORMs).
Contribution
It proposes a novel method for designing process rewards based on progress measurement under a distinct prover policy, with theoretical characterization and empirical validation showing significant improvements.
Findings
Test-time search with PAVs is >8% more accurate.
Online RL with PAVs yields 5-6x sample efficiency gains.
PAVs are 1.5-5x more compute-efficient than ORMs.
Abstract
A promising approach for improving reasoning in large language models is to use process reward models (PRMs). PRMs provide feedback at each step of a multi-step reasoning trace, potentially improving credit assignment over outcome reward models (ORMs) that only provide feedback at the final step. However, collecting dense, per-step human labels is not scalable, and training PRMs from automatically-labeled data has thus far led to limited gains. To improve a base policy by running search against a PRM or using it as dense rewards for reinforcement learning (RL), we ask: "How should we design process rewards?". Our key insight is that, to be effective, the process reward for a step should measure progress: a change in the likelihood of producing a correct response in the future, before and after taking the step, corresponding to the notion of step-level advantages in RL. Crucially, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services
MethodsSparse Evolutionary Training · Balanced Selection
