Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models

Jianfei Zhao; Feng Zhang; Xin Sun; Lingxing Kong; Zhixing Tan; Chong Feng

arXiv:2505.10634·cs.CV·September 17, 2025

Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models

Jianfei Zhao, Feng Zhang, Xin Sun, Lingxing Kong, Zhixing Tan, Chong Feng

PDF

Open Access

TL;DR

This paper introduces Cross-Image Contrastive Decoding (CICD), a training-free method that reduces hallucinations in large vision-language models by selectively suppressing language priors using unrelated images, improving output accuracy.

Contribution

CICD leverages unrelated images for contrastive decoding and employs a dynamic selection mechanism to precisely suppress language priors without harming response quality.

Findings

01

Reduces hallucinations in LVLMs effectively

02

Improves image captioning accuracy

03

Generalizes across multiple benchmarks

Abstract

Over-reliance on language priors is a major cause of hallucinations in Large Vision-Language Models (LVLMs), often leading to outputs that are linguistically plausible but visually inconsistent. Recent studies have explored contrastive decoding as a training-free solution. However, these methods typically construct contrastive visual inputs by perturbing the original image, resulting in distorted contrastive distributions, incomplete contrastive signals, and excessive suppression of language priors. Motivated by the observation that language priors tend to remain consistent across different images, we propose Cross-Image Contrastive Decoding (CICD), a simple yet effective training-free method that uses unrelated images as contrastive visual inputs. To address the issue of over-suppressing language priors, which can negatively affect the quality of generated responses, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis