Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

Jiawei Chen; Dingkang Yang; Tong Wu; Yue Jiang; Xiaolu Hou; Mingcheng Li; Shunli Wang; Dongling Xiao; Ke Li; Lihua Zhang

arXiv:2406.10185·cs.CV·April 28, 2026·6 cites

Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

Jiawei Chen, Dingkang Yang, Tong Wu, Yue Jiang, Xiaolu Hou, Mingcheng Li, Shunli Wang, Dongling Xiao, Ke Li, Lihua Zhang

PDF

1 Repo

TL;DR

This paper introduces Med-HallMark, a comprehensive benchmark and evaluation framework for detecting and assessing hallucinations in medical vision-language models, aiming to improve their reliability in healthcare.

Contribution

It presents the first dedicated medical hallucination detection benchmark, a hierarchical scoring metric, and a specialized LVLM for precise hallucination detection.

Findings

01

MediHallScore offers nuanced hallucination impact assessment.

02

MediHallDetector outperforms existing models in hallucination detection.

03

Benchmark facilitates standardized evaluation of medical LVLMs.

Abstract

Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications, including medical visual question answering and imaging report generation. While these models inherit the robust capabilities of foundational Large Language Models (LLMs), they also inherit susceptibility to hallucinations-a significant concern in high-stakes medical contexts where the margin for error is minimal. However, currently, there are no dedicated methods or benchmarks for hallucination detection and evaluation in the medical field. To bridge this gap, we introduce Med-HallMark, the first benchmark specifically designed for hallucination detection and evaluation within the medical multimodal domain. This benchmark provides multi-tasking hallucination support, multifaceted hallucination data, and hierarchical hallucination categorization. Furthermore, we propose the MediHall Score, a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ydk122024/Med-HallMark
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.