Mitigating Object Hallucination via Concentric Causal Attention
Yun Xing, Yiheng Li, Ivan Laptev, Shijian Lu

TL;DR
This paper identifies the link between Rotary Position Encoding and object hallucination in LVLMs, and proposes Concentric Causal Attention to improve factual alignment with images.
Contribution
It introduces Concentric Causal Attention, a novel positional alignment method that reduces object hallucination by addressing RoPE's long-term decay effects in LVLMs.
Findings
CCA significantly reduces object hallucination in LVLMs.
CCA outperforms existing mitigation strategies on multiple benchmarks.
The study reveals the impact of RoPE decay on visual-instruction interaction.
Abstract
Recent Large Vision Language Models (LVLMs) present remarkable zero-shot conversational and reasoning capabilities given multimodal queries. Nevertheless, they suffer from object hallucination, a phenomenon where LVLMs are prone to generate textual responses not factually aligned with image inputs. Our pilot study reveals that object hallucination is closely tied with Rotary Position Encoding (RoPE), a widely adopted positional dependency modeling design in existing LVLMs. Due to the long-term decay in RoPE, LVLMs tend to hallucinate more when relevant visual cues are distant from instruction tokens in the multimodal input sequence. Additionally, we observe a similar effect when reversing the sequential order of visual tokens during multimodal alignment. Our tests indicate that long-term decay in RoPE poses challenges to LVLMs while capturing visual-instruction interactions across long…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Face Recognition and Perception · Psychology of Moral and Emotional Judgment
MethodsSoftmax · Attention Is All You Need
