When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs

Fanpu Cao; Xin Zou; Xuming Hu; Hui Xiong

arXiv:2605.11559·cs.CV·May 13, 2026

When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs

Fanpu Cao, Xin Zou, Xuming Hu, Hui Xiong

PDF

1 Repo

TL;DR

This paper introduces LaSCD, a decoding method that uses Laplacian energy of visual attention to identify and reduce hallucinations in multimodal large language models without additional training.

Contribution

It reveals the role of high-frequency attention structure in hallucinations and proposes a novel, training-free decoding strategy to mitigate them.

Findings

01

LaSCD reduces hallucinations across multiple benchmarks.

02

Laplacian energy identifies layers where hallucinations emerge.

03

The method preserves the models' general capabilities.

Abstract

Multimodal large language models (MLLMs) have become a key interface for visual reasoning and grounded question answering, yet they remain vulnerable to visual hallucinations, where generated responses contradict image content or mention nonexistent objects. A central challenge is that hallucination is not always caused by a simple lack of visual attention: the model may still assign substantial attention mass to image tokens while internally drifting toward an incorrect answer. In this paper, we show that the high-frequency structure of visual attention, measured by layer-wise Laplacian energy, reveals both the layer where hallucinated preferences emerge and the layer where the ground-truth answer transiently recovers. Building on this finding, we propose LaSCD (Laplacian-Spectral Contrastive Decoding), a training-free decoding strategy that selects informative layers via Laplacian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

macovaseas/LaSCD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.