VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing

Yanbin Huang; Yisen Li; Guiyao Tie; Xiaoye Qu; Pan Zhou; Hongfei Wang; Zhaofan Zou; Hao Sun; Xuelong Li

arXiv:2604.19412·cs.CV·April 22, 2026

VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing

Yanbin Huang, Yisen Li, Guiyao Tie, Xiaoye Qu, Pan Zhou, Hongfei Wang, Zhaofan Zou, Hao Sun, Xuelong Li

PDF

TL;DR

VCE is a post-hoc, label-free method that reduces object hallucinations in LVLMs by analyzing and editing model responses to visual perturbations, improving accuracy without fine-tuning.

Contribution

Introduces VCE, a novel visual contrastive editing technique that mitigates hallucinations in LVLMs without requiring labeled data or fine-tuning.

Findings

01

VCE significantly reduces object hallucination across benchmarks.

02

VCE maintains the original computational efficiency of LVLMs.

03

VCE operates as a scalable, label-free intervention.

Abstract

Large vision-language models (LVLMs) frequently suffer from Object Hallucination (OH), wherein they generate descriptions containing objects that are not actually present in the input image. This phenomenon is particularly problematic in real-world applications such as medical imaging and autonomous driving, where accuracy is critical. Recent studies suggest that the hallucination problem may stem from language priors: biases learned during pretraining that cause LVLMs to generate words based on their statistical co-occurrence. To mitigate this problem, we propose Visual Contrastive Editing (VCE), a novel post-hoc method that identifies and suppresses hallucinatory tendencies by analyzing the model's response to contrastive visual perturbations. Using Singular Value Decomposition (SVD), we decompose the model's activation patterns to isolate hallucination subspaces and apply targeted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.