Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression
Hamidreza Dastmalchi, Aijun An, Ali Cheraghian, Hamed Barzamini

TL;DR
This paper introduces CIPHER, a training-free method that uses counterfactual visual perturbations to effectively suppress hallucinations in large vision-language models, improving their faithfulness without sacrificing task performance.
Contribution
CIPHER is the first approach to explicitly target vision-induced hallucinations in LVLMs using counterfactual image perturbations and a low-rank subspace correction method.
Findings
Significantly reduces hallucination rates across benchmarks.
Maintains high task performance while suppressing hallucinations.
Constructs a large counterfactual dataset for systematic analysis.
Abstract
While large vision-language models (LVLMs) achieve strong performance on multimodal tasks, they frequently generate hallucinations -- unfaithful outputs misaligned with the visual input. To address this issue, we introduce CIPHER (Counterfactual Image Perturbations for Hallucination Extraction and Removal), a training-free method that suppresses vision-induced hallucinations via lightweight feature-level correction. Unlike prior training-free approaches that primarily focus on text-induced hallucinations, CIPHER explicitly targets hallucinations arising from the visual modality. CIPHER operates in two phases. In the offline phase, we construct OHC-25K (Object-Hallucinated Counterfactuals, 25,000 samples), a counterfactual dataset consisting of diffusion-edited images that intentionally contradict the original ground-truth captions. We pair these edited images with the unchanged…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hallucinations in medical conditions · Digital Media Forensic Detection
