The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models
Robert Welch, Emir Konuk, Kevin Smith

TL;DR
This paper investigates how chain-of-thought reasoning in vision-language models affects uncertainty quantification, revealing that reasoning often leads to overconfidence due to implicit answer conditioning, despite improving accuracy.
Contribution
It uncovers the impact of reasoning on uncertainty estimates in VLMs and identifies implicit answer conditioning as a key factor causing overconfidence.
Findings
Reasoning degrades uncertainty estimate quality.
Implicit answer conditioning causes overconfidence.
Agreement-based consistency remains reliable for UQ.
Abstract
Vision-language models (VLMs) are increasingly deployed in high-stakes settings where reliable uncertainty quantification (UQ) is as important as predictive accuracy. Extended reasoning via chain-of-thought (CoT) prompting or reasoning-trained models has become ubiquitous in modern VLM pipelines, yet its effect on UQ reliability remains poorly understood. We show that reasoning consistently degrades the quality of most uncertainty estimates, even when it improves task accuracy. We identify implicit answer conditioning as the primary mechanism: as reasoning traces converge on a conclusion before the final answer is generated, token probabilities increasingly reflect consistency with the model's own reasoning trace rather than uncertainty about correctness. In effect, the model becomes overconfident in its answer. In contrast, agreement-based consistency remains robust and often improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Neurobiology of Language and Bilingualism
