Mitigating Object Hallucinations in Large Vision-Language Models with   Assembly of Global and Local Attention

Wenbin An; Feng Tian; Sicong Leng; Jiahao Nie; Haonan Lin; QianYing; Wang; Ping Chen; Xiaoqin Zhang; Shijian Lu

arXiv:2406.12718·cs.CV·March 17, 2025·2 cites

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing, Wang, Ping Chen, Xiaoqin Zhang, Shijian Lu

PDF

Open Access 1 Repo

TL;DR

This paper introduces AGLA, a training-free method that combines global and local attention to reduce object hallucinations in vision-language models by enhancing visual grounding and prompt relevance.

Contribution

The paper proposes a novel, plug-and-play approach called AGLA that effectively mitigates hallucinations in LVLMs by assembling global and local image features without additional training.

Findings

01

AGLA significantly reduces object hallucinations in LVLMs.

02

The method improves visual grounding and prompt relevance in generated responses.

03

AGLA demonstrates wide applicability across various tasks.

Abstract

Despite great success across various multimodal tasks, Large Vision-Language Models (LVLMs) often encounter object hallucinations with generated textual responses being inconsistent with the actual objects in images. We examine different LVLMs and pinpoint that one root cause of object hallucinations lies with deficient attention on discriminative image features. Specifically, LVLMs often predominantly attend to prompt-irrelevant global features instead of prompt-relevant local features, undermining their visual grounding capacity and leading to object hallucinations. We propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates hallucinations by assembling global features for response generation and local features for visual discrimination simultaneously. Specifically, we introduce an image-prompt matching scheme that captures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lackel/agla
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Epilepsy research and treatment · Functional Brain Connectivity Studies

MethodsSoftmax · Attention Is All You Need