Know What You Don't Know: Uncertainty Calibration of Process Reward Models

Young-Jin Park; Kristjan Greenewald; Kaveh Alim; Hao Wang; Navid Azizan

arXiv:2506.09338·stat.ML·November 10, 2025

Know What You Don't Know: Uncertainty Calibration of Process Reward Models

Young-Jin Park, Kristjan Greenewald, Kaveh Alim, Hao Wang, Navid Azizan

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces a calibration method for process reward models in large language models, improving their success probability estimates and enabling an adaptive inference strategy that reduces computational costs while maintaining accuracy.

Contribution

It presents a quantile regression calibration technique for PRMs and an instance-adaptive scaling framework that dynamically allocates compute based on calibrated success likelihoods.

Findings

01

Calibration reduces success probability overestimation.

02

Adaptive scaling decreases inference costs.

03

Method outperforms baseline calibration approaches.

Abstract

Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated. Specifically, they tend to overestimate the success probability that a partial reasoning step will lead to a correct final answer, particularly when smaller LLMs are used to complete the reasoning trajectory. To address this, we present a calibration approach -- performed via quantile regression -- that adjusts PRM outputs to better align with true success probabilities. Leveraging these calibrated success estimates and their associated confidence bounds, we introduce an \emph{instance-adaptive scaling} (IAS) framework that dynamically adjusts the compute budget based on the estimated likelihood that a partial reasoning trajectory will yield a correct final answer. Unlike conventional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

young-j-park/prm_calibration
dataset· 13 dl
13 dl

Videos

Know What You Don't Know: Uncertainty Calibration of Process Reward Models· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare