SDCD: Structure-Disrupted Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models

Yuxuan Xia; Siheng Wang; Peng Li

arXiv:2601.03500·cs.CV·January 8, 2026

SDCD: Structure-Disrupted Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models

Yuxuan Xia, Siheng Wang, Peng Li

PDF

Open Access

TL;DR

This paper introduces SDCD, a training-free decoding method that reduces object hallucinations in large vision-language models by disrupting visual structure during decoding, leading to more accurate multimodal understanding.

Contribution

The paper proposes a novel, training-free contrastive decoding algorithm called SDCD that mitigates hallucinations by penalizing texture-driven biases in visual encoding.

Findings

01

SDCD significantly reduces hallucinations across multiple benchmarks.

02

SDCD improves the multimodal reasoning capabilities of LVLMs.

03

The method is training-free and easy to integrate into existing systems.

Abstract

Large Vision-Language Models (LVLMs) demonstrate significant progress in multimodal understanding and reasoning, yet object hallucination remains a critical challenge. While existing research focuses on mitigating language priors or high-level statistical biases, they often overlook the internal complexities of the visual encoding process. We identify that visual statistical bias, arising from the inherent Bag-of-Patches behavior of Vision Encoders under weak structural supervision, acts as a contributing factor of object hallucinations. Under this bias, models prioritize local texture features within individual patches over holistic geometric structures. This tendency may induce spurious visual confidence and result in hallucinations. To address this, we introduce a training-free algorithm called Structure-Disrupted Contrastive Decoding (SDCD), which performs contrastive calibration of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications