Devils in Middle Layers of Large Vision-Language Models: Interpreting,   Detecting and Mitigating Object Hallucinations via Attention Lens

Zhangqi Jiang; Junkai Chen; Beier Zhu; Tingjin Luo; Yankun Shen; Xu; Yang

arXiv:2411.16724·cs.CV·April 2, 2025

Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens

Zhangqi Jiang, Junkai Chen, Beier Zhu, Tingjin Luo, Yankun Shen, Xu, Yang

PDF

Open Access 1 Repo

TL;DR

This paper investigates the role of middle layers in large vision-language models in causing hallucinations, using attention analysis to identify stages and develop a simple method to mitigate hallucinations without extra training.

Contribution

It reveals the significance of middle layers in hallucination formation and proposes an attention-based adjustment method to reduce hallucinations during inference.

Findings

01

Middle layers are crucial in processing visual information in LVLMs.

02

Attention patterns can distinguish between real and hallucinated tokens.

03

A simple attention adjustment method effectively reduces hallucinations.

Abstract

Hallucinations in Large Vision-Language Models (LVLMs) significantly undermine their reliability, motivating researchers to explore the causes of hallucination. However, most studies primarily focus on the language aspect rather than the visual. In this paper, we address how LVLMs process visual information and whether this process causes hallucination. Firstly, we use the attention lens to identify the stages at which LVLMs handle visual data, discovering that the middle layers are crucial. Moreover, we find that these layers can be further divided into two stages: ''visual information enrichment'' and ''semantic refinement'' which respectively propagate visual data to object tokens and interpret it through text. By analyzing attention patterns during the visual information enrichment stage, we find that real tokens consistently receive higher attention weights than hallucinated ones,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangqijiang07/middle_layers_indicating_hallucinations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Neural and Behavioral Psychology Studies · Functional Brain Connectivity Studies

MethodsAttention Is All You Need · Softmax · Linear Layer · Focus · Multi-Head Attention