Attention-space Contrastive Guidance for Efficient Hallucination Mitigation in LVLMs
Yujin Jo, Sangyoon Bae, Taesup Kim

TL;DR
This paper introduces Attention-space Contrastive Guidance (ACG), a training-free method that improves visual grounding in LVLMs by guiding attention layers to reduce hallucinations efficiently.
Contribution
ACG is a novel, single-pass, attention-layer-based approach that enhances hallucination mitigation without additional training or multiple inference passes.
Findings
ACG improves faithfulness over existing training-free methods.
ACG reduces latency by up to 2x compared to multi-pass methods.
Experiments on CHAIR and POPE datasets demonstrate effectiveness.
Abstract
Hallucinations in large vision--language models (LVLMs) often arise when language priors dominate over visual evidence, leading to object misidentification and visually inconsistent descriptions. We address this problem by framing hallucination mitigation as contrastive guidance that steers generation toward visually grounded and semantically faithful text. We propose Attention-space Contrastive Guidance (ACG), a training-free, single-pass method that operates directly in self-attention layers, where hallucination-inducing cross-modal biases emerge. ACG constructs both image-conditioned and approximate text-only attention paths within a single forward pass, enabling efficient guidance before errors accumulate at the output layer. Because this masking-based surrogate can introduce approximation bias, we further apply a lightweight orthogonal projection that suppresses components aligned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
