Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

Rachel Ma; Dylan Hadfield-Menell; Kristjan Greenewald

arXiv:2605.06785·cs.LG·May 13, 2026

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

Rachel Ma, Dylan Hadfield-Menell, Kristjan Greenewald

PDF

TL;DR

This paper introduces a novel calibration method for Process Reward Models using conditional optimal transport, improving confidence estimates and performance in mathematical reasoning benchmarks.

Contribution

It applies conditional optimal transport to calibrate PRMs, providing structural guarantees and flexible uncertainty estimation, a first in this context.

Findings

01

Significantly improves calibration of PRMs on reasoning benchmarks.

02

Enhances downstream performance in instance-adaptive scaling.

03

Offers a principled approach with structural guarantees for PRM calibration.

Abstract

Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional optimal transport for calibrating PRMs, modifying conditional OT (CondOT) map learning \cite{bunne2022supervised} to estimate a monotonic conditional quantile function over success probabilities estimated by the PRM, conditioned on PRM hidden states. This yields structurally valid quantile estimates and enables efficient extraction of confidence bounds at arbitrary levels, which we integrate into the instance-adaptive scaling (IAS) framework of \cite{park2025know}. We evaluate on mathematical reasoning benchmarks spanning moderate-difficulty problems (MATH-500) and harder out-of-distribution problems (AIME). For PRMs with reliable ranking signals, our method substantially improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.