Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang, Junkai Chen, Beier Zhu, Tingjin Luo, Yankun Shen, Xu, Yang

TL;DR
This paper investigates the role of middle layers in large vision-language models in causing hallucinations, using attention analysis to identify stages and develop a simple method to mitigate hallucinations without extra training.
Contribution
It reveals the significance of middle layers in hallucination formation and proposes an attention-based adjustment method to reduce hallucinations during inference.
Findings
Middle layers are crucial in processing visual information in LVLMs.
Attention patterns can distinguish between real and hallucinated tokens.
A simple attention adjustment method effectively reduces hallucinations.
Abstract
Hallucinations in Large Vision-Language Models (LVLMs) significantly undermine their reliability, motivating researchers to explore the causes of hallucination. However, most studies primarily focus on the language aspect rather than the visual. In this paper, we address how LVLMs process visual information and whether this process causes hallucination. Firstly, we use the attention lens to identify the stages at which LVLMs handle visual data, discovering that the middle layers are crucial. Moreover, we find that these layers can be further divided into two stages: ''visual information enrichment'' and ''semantic refinement'' which respectively propagate visual data to object tokens and interpret it through text. By analyzing attention patterns during the visual information enrichment stage, we find that real tokens consistently receive higher attention weights than hallucinated ones,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Neural and Behavioral Psychology Studies · Functional Brain Connectivity Studies
MethodsAttention Is All You Need · Softmax · Linear Layer · Focus · Multi-Head Attention
