Finding the Correct Visual Evidence Without Forgetting: Mitigating Hallucination in LVLMs via Inter-Layer Visual Attention Discrepancy

Yutong Xie; Zhenglin Hua; Ran Wang; Wing W. Y. Ng; Xizhao Wang; Yuheng Jia

arXiv:2605.20965·cs.CV·May 21, 2026

Finding the Correct Visual Evidence Without Forgetting: Mitigating Hallucination in LVLMs via Inter-Layer Visual Attention Discrepancy

Yutong Xie, Zhenglin Hua, Ran Wang, Wing W. Y. Ng, Xizhao Wang, Yuheng Jia

PDF

1 Repo

TL;DR

This paper introduces ILVAD, a training-free, plug-and-play method that reduces hallucinations in LVLMs by enhancing attention to visual evidence based on inter-layer attention discrepancies.

Contribution

The paper uncovers layer-specific sensitivity to visual evidence in LVLMs and proposes a novel attention discrepancy-based method to mitigate hallucinations without additional training.

Findings

01

ILVAD consistently reduces hallucinations across five LVLMs.

02

The method improves visual grounding and response accuracy.

03

It is effective across various architectures and tasks.

Abstract

Large Vision-Language Models (LVLMs) have shown remarkable performance on a wide range of vision-language tasks. Despite this progress, they are still prone to hallucination, generating responses that are inconsistent with visual content. In this work, we find that LVLMs tend to hallucinate when they pay insufficient attention to the correct visual evidence and gradually forget it during the generation process. We empirically find that although LVLMs overall attend insufficiently to visual evidence, they exhibit sensitivity to the correct visual evidence in specific layers, with notable inter-layer discrepancy. Motivated by this observation, we propose a novel hallucination mitigation method that enhances visual evidence based on Inter-Layer Visual Attention Discrepancy (ILVAD). Specifically, we obtain the attention weights from early generated tokens to visual tokens across layers and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ytx-ML/ILVAD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.