VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering

Jiayi Chen; Benteng Ma; Zehui Liao; Winston Chong; Yasmeen George; Jianfei Cai

arXiv:2605.20772·cs.CV·May 21, 2026

VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering

Jiayi Chen, Benteng Ma, Zehui Liao, Winston Chong, Yasmeen George, Jianfei Cai

PDF

1 Repo

TL;DR

This paper introduces VIHD, a novel hallucination detection method for medical visual question answering that uses visual token masking and semantic entropy calibration to improve detection accuracy.

Contribution

VIHD leverages visual dependency probing and targeted visual intervention decoding to enhance hallucination detection in medical VQA models, outperforming existing methods.

Findings

01

VIHD outperforms state-of-the-art hallucination detection methods on three medical VQA benchmarks.

02

Visual dependency plays a crucial role in effective hallucination detection.

03

Calibrated semantic entropy (CSE) provides a reliable signal for hallucination identification.

Abstract

While medical Multimodal Large Language Models (MLLMs) have shown promise in assisting diagnosis, they still frequently generate hallucinated responses that appear linguistically plausible but lack visual evidence. Such hallucinations pose risks to clinical decision-making and necessitate effective detection. Existing introspective detection methods primarily perform uncertainty estimation or logical verification by analyzing model responses conditioned on original or perturbed inputs. However, such external perturbations are often heuristic and context-agnostic, which overlooks the internal cross-modal dependency between generated tokens and related visual tokens during decoding. To address this issue, we propose VIHD, a Visual Intervention-based Hallucination Detection method that leverages targeted visual token masking to calibrate semantic entropy for more effective hallucination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jiayi-Chen-AU/VIHD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.