IKOD: Mitigating Visual Attention Degradation in Large Vision-Language Models

Jiabing Yang; Chenhang Cui; Yiyang Zhou; Yixiang Chen; Peng Xia; Ying Wei; Tao Yu; Yan Huang; Liang Wang

arXiv:2508.03469·cs.CV·August 6, 2025

IKOD: Mitigating Visual Attention Degradation in Large Vision-Language Models

Jiabing Yang, Chenhang Cui, Yiyang Zhou, Yixiang Chen, Peng Xia, Ying Wei, Tao Yu, Yan Huang, Liang Wang

PDF

TL;DR

This paper introduces IKOD, a lightweight decoding method that reduces hallucinations in large vision-language models by maintaining better visual attention, without extra training or data, thus improving model reliability.

Contribution

The paper proposes IKOD, a novel decoding strategy that mitigates attention degradation and hallucinations in LVLMs without additional training or external data.

Findings

01

IKOD effectively reduces hallucinations in LVLMs.

02

IKOD improves model performance on multiple benchmarks.

03

IKOD is computationally efficient and easy to integrate.

Abstract

Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated significant progress across multiple domains. However, these models still face the inherent challenge of integrating vision and language for collaborative inference, which often leads to "hallucinations", outputs that are not grounded in the corresponding images. Many efforts have been made to address these issues, but each comes with its own limitations, such as high computational cost or expensive dataset annotation. Recent research shows that LVLMs exhibit a long-term bias where hallucinations increase as the sequence length grows, yet the underlying cause remains poorly understood. Building on extensive research into attention mechanisms in LVLMs, we analyze the relationship between this long-term bias and visual attention. In our research, we identify a consistent phenomenon in current LVLMs: the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.