Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow

Jianfei Zhao; Feng Zhang; Xin Sun; Chong Feng

arXiv:2505.14257·cs.CV·September 24, 2025

Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow

Jianfei Zhao, Feng Zhang, Xin Sun, Chong Feng

PDF

Open Access 1 Video

TL;DR

This paper proposes a method to reduce hallucinations in large vision-language models by aligning attention distribution with information flow, improving visual understanding and controllability.

Contribution

It introduces a two-stage optimization to align attention with information flow, significantly reducing hallucinations across multiple benchmarks and models.

Findings

01

Effective hallucination reduction demonstrated on three benchmarks.

02

Trade-off observed between hallucination reduction and detail richness.

03

Manual adjustment enables flexible control over model conservativeness.

Abstract

Due to the unidirectional masking mechanism, Decoder-Only models propagate information from left to right. LVLMs (Large Vision-Language Models) follow the same architecture, with visual information gradually integrated into semantic representations during forward propagation. Through systematic analysis, we observe that the majority of the visual information is absorbed into the semantic representations. However, the model's attention distribution does not exhibit sufficient emphasis on semantic representations. This misalignment between the attention distribution and the actual information flow undermines the model's visual understanding ability and contributes to hallucinations. To address this issue, we enhance the model's visual understanding by leveraging the core information embedded in semantic representations. Specifically, we identify attention heads that focus on core semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning

MethodsSoftmax · Attention Is All You Need · Focus