How Uncertainty Estimation Scales with Sampling in Reasoning Models
Maksym Del, Markus K\"angsepp, Marharyta Domnich, Ardi Tampuu, Lisa Yankovskaya, Meelis Kull, Mark Fishel

TL;DR
This paper investigates how uncertainty estimation methods like self-consistency and verbalized confidence scale with sampling in reasoning models across various domains, revealing that combining signals enhances uncertainty quality and domain-dependent differences exist.
Contribution
It provides a comprehensive analysis of uncertainty signal scaling in reasoning models, highlighting the benefits of hybrid estimators and domain-specific behaviors.
Findings
Self-consistency and verbalized confidence scale with sampling.
Hybrid estimators improve AUROC by up to 12 points.
Mathematics domain shows stronger uncertainty signals and faster scaling.
Abstract
Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box approach using verbalized confidence and self-consistency. Across three reasoning models and 17 tasks spanning mathematics, STEM, and humanities, we characterize how these signals scale. Both self-consistency and verbalized confidence scale in reasoning models, but self-consistency exhibits lower initial discrimination and lags behind verbalized confidence under moderate sampling. Most uncertainty gains, however, arise from signal combination: with just two samples, a hybrid estimator improves AUROC by up to on average and already outperforms either signal alone even when scaled to much larger budgets, after which returns diminish. These effects are domain-dependent: in mathematics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
