Deterministic Hallucination Detection in Medical VQA via Confidence-Evidence Bayesian Gain

Mohammad Asadi; Tahoura Nedaee; Jack W. O'Sullivan; Euan Ashley; Ehsan Adeli

arXiv:2603.21693·cs.AI·March 24, 2026

Deterministic Hallucination Detection in Medical VQA via Confidence-Evidence Bayesian Gain

Mohammad Asadi, Tahoura Nedaee, Jack W. O'Sullivan, Euan Ashley, Ehsan Adeli

PDF

Open Access

TL;DR

This paper introduces CEBaG, a deterministic method for detecting hallucinations in medical VQA models that leverages model confidence and evidence signals without requiring stochastic sampling or external models.

Contribution

The paper proposes CEBaG, a novel deterministic hallucination detection approach that outperforms existing stochastic methods in medical VQA tasks without additional computational overhead.

Findings

01

CEBaG achieves the highest AUC in 13 of 16 settings.

02

It improves over VASE by 8 AUC points on average.

03

CEBaG requires no stochastic sampling or external models.

Abstract

Multimodal large language models (MLLMs) have shown strong potential for medical Visual Question Answering (VQA), yet they remain prone to hallucinations, defined as generating responses that contradict the input image, posing serious risks in clinical settings. Current hallucination detection methods, such as Semantic Entropy (SE) and Vision-Amplified Semantic Entropy (VASE), require 10 to 20 stochastic generations per sample together with an external natural language inference model for semantic clustering, making them computationally expensive and difficult to deploy in practice. We observe that hallucinated responses exhibit a distinctive signature directly in the model's own log-probabilities: inconsistent token-level confidence and weak sensitivity to visual evidence. Based on this observation, we propose Confidence-Evidence Bayesian Gain (CEBaG), a deterministic hallucination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning