Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow
Jianfei Zhao, Feng Zhang, Xin Sun, Chong Feng

TL;DR
This paper proposes a method to reduce hallucinations in large vision-language models by aligning attention distribution with information flow, improving visual understanding and controllability.
Contribution
It introduces a two-stage optimization to align attention with information flow, significantly reducing hallucinations across multiple benchmarks and models.
Findings
Effective hallucination reduction demonstrated on three benchmarks.
Trade-off observed between hallucination reduction and detail richness.
Manual adjustment enables flexible control over model conservativeness.
Abstract
Due to the unidirectional masking mechanism, Decoder-Only models propagate information from left to right. LVLMs (Large Vision-Language Models) follow the same architecture, with visual information gradually integrated into semantic representations during forward propagation. Through systematic analysis, we observe that the majority of the visual information is absorbed into the semantic representations. However, the model's attention distribution does not exhibit sufficient emphasis on semantic representations. This misalignment between the attention distribution and the actual information flow undermines the model's visual understanding ability and contributes to hallucinations. To address this issue, we enhance the model's visual understanding by leveraging the core information embedded in semantic representations. Specifically, we identify attention heads that focus on core semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
MethodsSoftmax · Attention Is All You Need · Focus
