LLM Reasoning with Process Rewards for Outcome-Guided Steps

Mohammad Rezaei; Jens Lehmann; Sahar Vahdati

arXiv:2604.02341·cs.LG·April 6, 2026

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Mohammad Rezaei, Jens Lehmann, Sahar Vahdati

PDF

TL;DR

This paper introduces PROGRS, a framework that enhances mathematical reasoning in large language models by effectively integrating process rewards with outcome correctness, leading to improved performance.

Contribution

PROGRS leverages process reward models as relative preferences within outcome groups, using outcome-conditioned centering to improve reasoning accuracy without additional trainable components.

Findings

01

PROGRS improves Pass@1 scores across multiple math benchmarks.

02

Outcome-conditioned centering reduces bias and enhances reward alignment.

03

PROGRS achieves better results with fewer rollouts than outcome-only methods.

Abstract

Mathematical reasoning in large language models has improved substantially with reinforcement learning using verifiable rewards, where final answers can be checked automatically and converted into reliable training signals. Most such pipelines optimize outcome correctness only, which yields sparse feedback for long, multi-step solutions and offers limited guidance on intermediate reasoning errors. Recent work therefore introduces process reward models (PRMs) to score intermediate steps and provide denser supervision. In practice, PRM scores are often imperfectly aligned with final correctness and can reward locally fluent reasoning that still ends in an incorrect answer. When optimized as absolute rewards, such signals can amplify fluent failure modes and induce reward hacking. We propose PROGRS, a framework that leverages PRMs while keeping outcome correctness dominant. PROGRS treats…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.