Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

William Rudman; Michal Golovanevsky; Dana Arad; Yonatan Belinkov; Ritambhara Singh; Carsten Eickhoff; Kyle Mahowald

arXiv:2601.05201·cs.CV·April 20, 2026

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

William Rudman, Michal Golovanevsky, Dana Arad, Yonatan Belinkov, Ritambhara Singh, Carsten Eickhoff, Kyle Mahowald

PDF

1 Datasets

TL;DR

This paper investigates how vision-language models hallucinate in response to prompts, identifying specific attention heads responsible for this behavior and demonstrating that their ablation reduces hallucinations significantly.

Contribution

It uncovers the internal attention mechanisms behind prompt-induced hallucinations and shows that targeted ablation of certain heads mitigates this issue without retraining.

Findings

01

A small set of attention heads are responsible for prompt copying behavior.

02

Ablating these heads reduces hallucinations by at least 40%.

03

Ablation increases model correction towards visual evidence.

Abstract

Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting setting, where the prompt overstates the number of objects in the image (e.g., asking a model to describe four waterlilies when only three are present). At low object counts, models often correct the overestimation, but as the number of objects increases, they increasingly conform to the prompt regardless of the discrepancy. Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40% without additional training. Across models, PIH-heads mediate prompt copying in model-specific ways. We characterize these differences and show that PIH ablation increases correction toward visual evidence.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MM-Hallu/pih
dataset· 122 dl
122 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.