Know What You Don't Know: Uncertainty Calibration of Process Reward Models
Young-Jin Park, Kristjan Greenewald, Kaveh Alim, Hao Wang, Navid Azizan

TL;DR
This paper introduces a calibration method for process reward models in large language models, improving their success probability estimates and enabling an adaptive inference strategy that reduces computational costs while maintaining accuracy.
Contribution
It presents a quantile regression calibration technique for PRMs and an instance-adaptive scaling framework that dynamically allocates compute based on calibrated success likelihoods.
Findings
Calibration reduces success probability overestimation.
Adaptive scaling decreases inference costs.
Method outperforms baseline calibration approaches.
Abstract
Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated. Specifically, they tend to overestimate the success probability that a partial reasoning step will lead to a correct final answer, particularly when smaller LLMs are used to complete the reasoning trajectory. To address this, we present a calibration approach -- performed via quantile regression -- that adjusts PRM outputs to better align with true success probabilities. Leveraging these calibrated success estimates and their associated confidence bounds, we introduce an \emph{instance-adaptive scaling} (IAS) framework that dynamically adjusts the compute budget based on the estimated likelihood that a partial reasoning trajectory will yield a correct final answer. Unlike conventional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
