Process Rewards with Learned Reliability

Jinyuan Li; Langlin Huang; Chengsong Huang; Shaoyang Xu; Donghong Cai; Yuyi Yang; Wenxuan Zhang; Jiaxin Huang

arXiv:2605.15529·cs.CL·May 18, 2026

Process Rewards with Learned Reliability

Jinyuan Li, Langlin Huang, Chengsong Huang, Shaoyang Xu, Donghong Cai, Yuyi Yang, Wenxuan Zhang, Jiaxin Huang

PDF

1 Repo

TL;DR

BetaPRM introduces a distributional reward model that predicts both success probabilities and reliability, enabling more trustworthy step-level feedback and improving reasoning efficiency.

Contribution

The paper proposes BetaPRM, a novel distributional PRM that learns a reliability signal for step rewards, enhancing decision trustworthiness and enabling adaptive computation strategies.

Findings

01

BetaPRM improves Best-of-N reasoning accuracy across benchmarks.

02

ACA reduces token usage by up to 33.57% while increasing accuracy.

03

BetaPRM maintains effective error detection with reliability signals.

Abstract

Process Reward Models (PRMs) provide step-level feedback for reasoning, but current PRMs usually output only a single reward score for each step. Downstream methods must therefore treat imperfect step-level reward predictions as reliable decision signals, with no indication of when these predictions should be trusted. We propose BetaPRM, a distributional PRM that predicts both a step-level success probability and the reliability of that prediction. Given step-success supervision from Monte Carlo continuations, BetaPRM learns a Beta belief that explains the observed number of successful continuations through a Beta-Binomial likelihood, rather than regressing to the finite-sample success ratio as a point target. This learned reliability signal indicates when a step reward should be trusted, enabling downstream applications to distinguish reliable rewards from uncertain ones. As one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinyuanli0012/Beta-Binomial-PRM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.