Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing, Wang, Ping Chen, Xiaoqin Zhang, Shijian Lu

TL;DR
This paper introduces AGLA, a training-free method that combines global and local attention to reduce object hallucinations in vision-language models by enhancing visual grounding and prompt relevance.
Contribution
The paper proposes a novel, plug-and-play approach called AGLA that effectively mitigates hallucinations in LVLMs by assembling global and local image features without additional training.
Findings
AGLA significantly reduces object hallucinations in LVLMs.
The method improves visual grounding and prompt relevance in generated responses.
AGLA demonstrates wide applicability across various tasks.
Abstract
Despite great success across various multimodal tasks, Large Vision-Language Models (LVLMs) often encounter object hallucinations with generated textual responses being inconsistent with the actual objects in images. We examine different LVLMs and pinpoint that one root cause of object hallucinations lies with deficient attention on discriminative image features. Specifically, LVLMs often predominantly attend to prompt-irrelevant global features instead of prompt-relevant local features, undermining their visual grounding capacity and leading to object hallucinations. We propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates hallucinations by assembling global features for response generation and local features for visual discrimination simultaneously. Specifically, we introduce an image-prompt matching scheme that captures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Epilepsy research and treatment · Functional Brain Connectivity Studies
MethodsSoftmax · Attention Is All You Need
