Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
Boyang Gong, Yu Zheng, Fanye Kong, Jie Zhou, and Jiwen Lu

TL;DR
This paper identifies visual attention inertia in multimodal large language models as a cause of cognitive hallucinations and proposes a training-free method, IVE, to dynamically modulate attention and improve inference accuracy.
Contribution
The paper introduces IVE, a novel training-free approach that mitigates visual inertia in MLLMs, enhancing their ability to perform cognitive inference and reduce hallucinations.
Findings
IVE effectively reduces cognitive hallucinations across multiple benchmarks.
Attention inertia is a key factor in the persistence of hallucinations.
IVE improves compositional understanding in various MLLMs.
Abstract
Like a body at rest that stays at rest, we find that visual attention in multimodal large language models (MLLMs) exhibits pronounced inertia, remaining largely static once settled during early decoding steps and failing to support the compositional understanding required for cognitive inference. While existing hallucination mitigation methods mainly target perceptual hallucinations concerning object existence or attributes, they remain inadequate for such cognitive hallucinations that require inter-object relational deduction. Through token-wise attention analysis, we identify this visual inertia as a key factor: attention to semantically critical regions remains persistently focused and fails to dynamically support relational inference. We thereby propose a training-free Inertia-aware Visual Excitation (IVE) method that breaks this inertial pattern by modeling cognitive inference as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
