Global Context or Local Detail? Adaptive Visual Grounding for Hallucination Mitigation

Yubo Jiang; Xin Yang; Abudukelimu Wuerkaixi; Zheming Yuan; Xuxin Cheng; Fengying Xie; Zhiguo Jiang; Cao Liu; Ke Zeng; Haopeng Zhang

arXiv:2604.24396·cs.CV·April 28, 2026

Global Context or Local Detail? Adaptive Visual Grounding for Hallucination Mitigation

Yubo Jiang, Xin Yang, Abudukelimu Wuerkaixi, Zheming Yuan, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang

PDF

TL;DR

This paper introduces PND, a training-free inference method that reduces object hallucination in vision-language models by enforcing visual fidelity through a dual-path contrast mechanism.

Contribution

The paper presents PND, a novel inference framework that mitigates hallucination in VLMs without retraining, by correcting attention deficits and contrasting visual evidence during decoding.

Findings

01

PND improves accuracy by up to 6.5% on benchmarks.

02

It substantially reduces object hallucination in VLMs.

03

PND enhances descriptive detail without retraining models.

Abstract

Vision-Language Models (VLMs) are frequently undermined by object hallucination--generating content that contradicts visual reality--due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervenes directly in the decoding process to enforce visual fidelity. PND is motivated by our key finding of a critical attention deficit in VLMs, where visual features are empirically under-weighted. Our framework corrects this via a dual-path contrast: The positive path amplifies salient visual evidence using multi-layer attention to encourage faithful descriptions, directly counteracting the attention deficit. Simultaneously, the negative path identifies and degrades the core object's features to create a strong counterfactual, which penalizes ungrounded, prior-dominant generation. By contrasting the model's outputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.