Attention-space Contrastive Guidance for Efficient Hallucination Mitigation in LVLMs

Yujin Jo; Sangyoon Bae; Taesup Kim

arXiv:2601.13707·cs.CV·April 21, 2026

Attention-space Contrastive Guidance for Efficient Hallucination Mitigation in LVLMs

Yujin Jo, Sangyoon Bae, Taesup Kim

PDF

TL;DR

This paper introduces Attention-space Contrastive Guidance (ACG), a training-free method that improves visual grounding in LVLMs by guiding attention layers to reduce hallucinations efficiently.

Contribution

ACG is a novel, single-pass, attention-layer-based approach that enhances hallucination mitigation without additional training or multiple inference passes.

Findings

01

ACG improves faithfulness over existing training-free methods.

02

ACG reduces latency by up to 2x compared to multi-pass methods.

03

Experiments on CHAIR and POPE datasets demonstrate effectiveness.

Abstract

Hallucinations in large vision--language models (LVLMs) often arise when language priors dominate over visual evidence, leading to object misidentification and visually inconsistent descriptions. We address this problem by framing hallucination mitigation as contrastive guidance that steers generation toward visually grounded and semantically faithful text. We propose Attention-space Contrastive Guidance (ACG), a training-free, single-pass method that operates directly in self-attention layers, where hallucination-inducing cross-modal biases emerge. ACG constructs both image-conditioned and approximate text-only attention paths within a single forward pass, enabling efficient guidance before errors accumulate at the output layer. Because this masking-based surrogate can introduce approximation bias, we further apply a lightweight orthogonal projection that suppresses components aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.