Don't Think Twice! Over-Reasoning Impairs Confidence Calibration
Romain Lacombe, Kerrie Wu, Eddie Dilworth

TL;DR
This paper shows that increasing reasoning steps in large language models can impair confidence calibration, and that search-augmented methods significantly improve confidence accuracy in knowledge-intensive tasks.
Contribution
It challenges the belief that more reasoning always improves calibration, highlighting the importance of information access over reasoning depth.
Findings
Longer reasoning budgets lead to overconfidence and worse calibration.
Search-augmented generation outperforms pure reasoning, achieving 89.3% accuracy.
Increasing reasoning steps does not improve, and can harm, confidence calibration.
Abstract
Large Language Models deployed as question answering tools require robust calibration to avoid overconfidence. We systematically evaluate how reasoning capabilities and budget affect confidence assessment accuracy, using the ClimateX dataset (Lacombe et al., 2023) and expanding it to human and planetary health. Our key finding challenges the "test-time scaling" paradigm: while recent reasoning LLMs achieve 48.7% accuracy in assessing expert confidence, increasing reasoning budgets consistently impairs rather than improves calibration. Extended reasoning leads to systematic overconfidence that worsens with longer thinking budgets, producing diminishing and negative returns beyond modest computational investments. Conversely, search-augmented generation dramatically outperforms pure reasoning, achieving 89.3% accuracy by retrieving relevant evidence. Our results suggest that information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
