Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding

Yubo Jiang; Yitong An; Xin Yang; Abudukelimu Wuerkaixi; Xuxin Cheng; Fengying Xie; Zhiguo Jiang; Cao Liu; Ke Zeng; and Haopeng Zhang

arXiv:2605.06679·cs.LG·May 11, 2026

Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding

Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, and Haopeng Zhang

PDF

TL;DR

This paper presents PND, a training-free inference method that improves vision-language model outputs by balancing visual evidence and linguistic priors, reducing hallucinations.

Contribution

Introducing a novel inference framework, PND, that enforces visual fidelity in VLMs without retraining by contrasting positive and negative decoding paths.

Findings

01

PND achieves state-of-the-art results on POPE, MME, and CHAIR datasets.

02

PND reduces object hallucination in vision-language models.

03

PND operates without additional training or fine-tuning.

Abstract

Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervenes directly in the decoding process to enforce visual fidelity. PND is motivated by our finding of an attention imbalance in VLMs, where visual features are under-weighted. Our framework introduces a dual-path contrast: a positive path that amplifies visual evidence and a negative path that constructs counterfactuals to penalize prior-dominant generation. By contrasting outputs from both paths during decoding, PND steers generation toward visually grounded results. Experiments on POPE, MME, and CHAIR demonstrate state-of-the-art performance without retraining.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.