Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression

Hamidreza Dastmalchi; Aijun An; Ali Cheraghian; Hamed Barzamini

arXiv:2603.10470·cs.CV·March 12, 2026

Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression

Hamidreza Dastmalchi, Aijun An, Ali Cheraghian, Hamed Barzamini

PDF

Open Access

TL;DR

This paper introduces CIPHER, a training-free method that uses counterfactual visual perturbations to effectively suppress hallucinations in large vision-language models, improving their faithfulness without sacrificing task performance.

Contribution

CIPHER is the first approach to explicitly target vision-induced hallucinations in LVLMs using counterfactual image perturbations and a low-rank subspace correction method.

Findings

01

Significantly reduces hallucination rates across benchmarks.

02

Maintains high task performance while suppressing hallucinations.

03

Constructs a large counterfactual dataset for systematic analysis.

Abstract

While large vision-language models (LVLMs) achieve strong performance on multimodal tasks, they frequently generate hallucinations -- unfaithful outputs misaligned with the visual input. To address this issue, we introduce CIPHER (Counterfactual Image Perturbations for Hallucination Extraction and Removal), a training-free method that suppresses vision-induced hallucinations via lightweight feature-level correction. Unlike prior training-free approaches that primarily focus on text-induced hallucinations, CIPHER explicitly targets hallucinations arising from the visual modality. CIPHER operates in two phases. In the offline phase, we construct OHC-25K (Object-Hallucinated Counterfactuals, 25,000 samples), a counterfactual dataset consisting of diffusion-edited images that intentionally contradict the original ground-truth captions. We pair these edited images with the unchanged…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hallucinations in medical conditions · Digital Media Forensic Detection