Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding

Zahra Mahdavi; Zahra Khodakaramimaghsoud; Hooman Khaloo; Sina Bakhshandeh Taleshani; Erfan Hashemi; Javad Mirzapour Kaleybar; Omid Nejati Manzari

arXiv:2512.01922·cs.CV·December 2, 2025

Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding

Zahra Mahdavi, Zahra Khodakaramimaghsoud, Hooman Khaloo, Sina Bakhshandeh Taleshani, Erfan Hashemi, Javad Mirzapour Kaleybar, Omid Nejati Manzari

PDF

Open Access

TL;DR

Med-VCD is a novel decoding method for medical vision-language models that reduces hallucinations and improves factual accuracy without slowing down inference, by selectively focusing on visually relevant tokens.

Contribution

Introduces Med-VCD, a sparse visual-contrastive decoding approach with token-sparsification that enhances medical LVLM reliability efficiently.

Findings

01

Raises factual accuracy by 13% on average

02

Improves hallucination accuracy by 6%

03

Effective across diverse medical imaging tasks

Abstract

Large vision-language models (LVLMs) are now central to healthcare applications such as medical visual question answering and imaging report generation. Yet, these models remain vulnerable to hallucination outputs that appear plausible but are in fact incorrect. In the natural image domain, several decoding strategies have been proposed to mitigate hallucinations by reinforcing visual evidence, but most rely on secondary decoding or rollback procedures that substantially slow inference. Moreover, existing solutions are often domain-specific and may introduce misalignment between modalities or between generated and ground-truth content. We introduce Med-VCD, a sparse visual-contrastive decoding method that mitigates hallucinations in medical LVLMs without the time overhead of secondary decoding. Med-VCD incorporates a novel token-sparsification strategy that selects visually informed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning