Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models
Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan

TL;DR
This paper investigates the phenomenon of hallucination in vision-language models, revealing that overthinking across layers contributes to hallucination and proposing a new detection method based on this insight.
Contribution
The authors introduce the Overthinking Score, a novel metric that captures model reasoning dynamics across layers to improve hallucination detection in vision-language models.
Findings
Overthinking behavior correlates with hallucination instances.
The Overthinking Score improves detection F1 to 78.9% on MSCOCO.
Intermediate layers often contain incorrect hypotheses before final output.
Abstract
Vision Language models (VLMs) often hallucinate non-existent objects. Detecting hallucination is analogous to detecting deception: a single final statement is insufficient, one must examine the underlying reasoning process. Yet existing detectors rely mostly on final-layer signals. Attention-based methods assume hallucinated tokens exhibit low attention, while entropy-based ones use final-step uncertainty. Our analysis reveals the opposite: hallucinated objects can exhibit peaked attention due to contextual priors; and models often express high confidence because intermediate layers have already converged to an incorrect hypothesis. We show that the key to hallucination detection lies within the model's thought process, not its final output. By probing decoder layers, we uncover a previously overlooked behavior, overthinking: models repeatedly revise object hypotheses across layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
