DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination

Xuan Gong; Tianshi Ming; Xinpeng Wang; Zhihua Wei

arXiv:2410.04514·cs.CL·November 7, 2025

DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination

Xuan Gong, Tianshi Ming, Xinpeng Wang, Zhihua Wei

PDF

Open Access 1 Video

TL;DR

DAMRO is a training-free method that reduces object hallucination in LVLMs by filtering out background tokens during attention, improving the accuracy of visual-language models without additional training.

Contribution

The paper introduces DAMRO, a novel training-free approach that leverages attention mechanisms to mitigate object hallucination in LVLMs, addressing a key flaw in visual encoders.

Findings

01

Significantly reduces object hallucination in LVLMs

02

Effective across multiple LVLM architectures and benchmarks

03

Improves alignment between attention focus and referred objects

Abstract

Despite the great success of Large Vision-Language Models (LVLMs), they inevitably suffer from hallucination. As we know, both the visual encoder and the Large Language Model (LLM) decoder in LVLMs are Transformer-based, allowing the model to extract visual information and generate text outputs via attention mechanisms. We find that the attention distribution of LLM decoder on image tokens is highly consistent with the visual encoder and both distributions tend to focus on particular background tokens rather than the referred objects in the image. We attribute to the unexpected attention distribution to an inherent flaw in the visual encoder itself, which misguides LLMs to over emphasize the redundant information and generate object hallucination. To address the issue, we propose DAMRO, a novel training-free strategy that $D$ ive into $A$ ttention $M$ echanism of LVLM to $R$ educe $O$ bject…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination· underline

Taxonomy

TopicsHallucinations in medical conditions · Functional Brain Connectivity Studies · EEG and Brain-Computer Interfaces

MethodsSoftmax · Attention Is All You Need · Focus