Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation

Yebo Wu; Han Jin; Zhijiang Guo; Li Li

arXiv:2604.10071·cs.CV·April 14, 2026

Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation

Yebo Wu, Han Jin, Zhijiang Guo, Li Li

PDF

TL;DR

This paper proposes DaID, a contrastive decoding framework that reduces hallucinations in multimodal large language models by dynamically calibrating token generation using visual attention and internal model discrepancies.

Contribution

The paper introduces Dual-Anchor Introspective Decoding (DaID), a novel method that leverages visual attention to mitigate hallucinations in MLLMs.

Findings

01

DaID significantly reduces hallucinations across multiple benchmarks.

02

DaID enhances reasoning capabilities of MLLMs.

03

The method effectively calibrates token generation using visual attention signals.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable reasoning capabilities yet continue to suffer from hallucination, where generated text contradicts visual content. In this paper, we introduce Dual-Anchor Introspective Decoding (DaID), a novel contrastive decoding framework that dynamically calibrates each token generation by mining the model's internal perceptual discrepancies. Specifically, DaID identifies a Spotlight layer to amplify visual factual signals and a Shadow layer to suppress textual inertia. By leveraging visual attention distributions to guide this dual-anchor selection process, our method ensures precise, token-specific adaptation. Experimental results across multiple benchmarks and MLLMs demonstrate that DaID significantly mitigates hallucination while enhancing general reasoning capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.