Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

Abin Shoby; Ta Duc Huy; Tuan Dung Nguyen; Minh Khoi Ho; Qi Chen; Anton van den Hengel; Phi Le Nguyen; Johan W. Verjans; Vu Minh Hieu Phan

arXiv:2603.07619·cs.CV·March 31, 2026

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan

PDF

TL;DR

This paper investigates the phenomenon of hallucination in vision-language models, revealing that overthinking across layers contributes to hallucination and proposing a new detection method based on this insight.

Contribution

The authors introduce the Overthinking Score, a novel metric that captures model reasoning dynamics across layers to improve hallucination detection in vision-language models.

Findings

01

Overthinking behavior correlates with hallucination instances.

02

The Overthinking Score improves detection F1 to 78.9% on MSCOCO.

03

Intermediate layers often contain incorrect hypotheses before final output.

Abstract

Vision Language models (VLMs) often hallucinate non-existent objects. Detecting hallucination is analogous to detecting deception: a single final statement is insufficient, one must examine the underlying reasoning process. Yet existing detectors rely mostly on final-layer signals. Attention-based methods assume hallucinated tokens exhibit low attention, while entropy-based ones use final-step uncertainty. Our analysis reveals the opposite: hallucinated objects can exhibit peaked attention due to contextual priors; and models often express high confidence because intermediate layers have already converged to an incorrect hypothesis. We show that the key to hallucination detection lies within the model's thought process, not its final output. By probing decoder layers, we uncover a previously overlooked behavior, overthinking: models repeatedly revise object hypotheses across layers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.