TL;DR
This paper investigates whether reasoning models can accurately assess their own uncertainty, finding they are often overconfident, especially with deeper reasoning, but can improve calibration through introspective techniques in some cases.
Contribution
It introduces introspective uncertainty quantification (UQ) for reasoning models and evaluates its effectiveness across multiple benchmarks and models.
Findings
Reasoning models are often overconfident in their responses.
Deeper reasoning tends to increase overconfidence.
Introspective UQ can improve calibration in some models, but not all.
Abstract
Reasoning language models have set state-of-the-art (SOTA) records on many challenging benchmarks, enabled by multi-step reasoning induced using reinforcement learning. However, like previous language models, reasoning models are prone to generating confident, plausible responses that are incorrect (hallucinations). Knowing when and how much to trust these models is critical to the safe deployment of reasoning models in real-world applications. To this end, we explore uncertainty quantification of reasoning models in this work. Specifically, we ask three fundamental questions: First, are reasoning models well-calibrated? Second, does deeper reasoning improve model calibration? Finally, inspired by humans' innate ability to double-check their thought processes to verify the validity of their answers and their confidence, we ask: can reasoning models improve their calibration by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
