TL;DR
This paper uncovers a phenomenon called Vocabulary Hijacking in LVLMs, where certain attention heads disproportionately focus on inert tokens that decode to unrelated words, leading to hallucinations, and proposes a training-free method to mitigate this issue.
Contribution
The paper introduces the concept of Vocabulary Hijacking, develops metrics to identify critical attention heads, and proposes HAVAE, a training-free method to reduce hallucinations in LVLMs.
Findings
HAVAE significantly reduces hallucinations across benchmarks.
Identified attention heads critical for factual accuracy.
Vocabulary Hijacking causes semantic collapse in attention mechanisms.
Abstract
Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal tasks, yet their reliability is persistently undermined by hallucinations-generating text that contradicts visual input. Recent studies often attribute these errors to inadequate visual attention. In this work, we analyze the attention mechanisms via the logit lens, uncovering a distinct anomaly we term Vocabulary Hijacking. We discover that specific visual tokens, defined as Inert Tokens, disproportionately attract attention. Crucially, when their intermediate hidden states are projected into the vocabulary space, they consistently decode to a fixed set of unrelated words (termed Hijacking Anchors) across layers, revealing a rigid semantic collapse. Leveraging this semantic rigidity, we propose Hijacking Anchor-Based Identification (HABI), a robust strategy to accurately localize these Inert Tokens. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
