Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation
Thomas Zollo, Jimmy Wang, Richard Zemel

TL;DR
This paper presents an unsupervised method to calibrate confidence estimates in reasoning language models using only a single inference sample, enhancing reliability without requiring labels or multiple runs.
Contribution
It introduces a novel self-consistency-based proxy and distillation approach for confidence calibration applicable at inference time with minimal data.
Findings
Outperforms baseline calibration methods across 5 math and QA tasks.
Improves model performance in selective prediction and downstream decision-making.
Effective under distribution shift conditions.
Abstract
Reasoning language models can solve increasingly complex tasks, but struggle to produce the calibrated confidence estimates necessary for reliable deployment. Existing calibration methods usually depend on labels or repeated sampling at inference time, making them impractical in many settings. We introduce a method for unsupervised confidence calibration of reasoning LLMs when only a single generation is available at inference time. Our approach uses offline sampling on unlabeled data to derive a self-consistency-based proxy target, then distills this signal into a lightweight deployment-time confidence predictor. In a broad evaluation across 5 math and question-answering tasks using 9 reasoning models, our method substantially outperforms baselines, including under distribution shift, and improves downstream performance in selective prediction and simulated downstream decision-making.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
