MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical   Hallucination in Vision-Language Models

Qiao Yan; Yuchen Yuan; Xiaowei Hu; Yihan Wang; Jiaqi Xu; Jinpeng Li,; Chi-Wing Fu; Pheng-Ann Heng

arXiv:2502.20780·cs.AI·March 3, 2025

MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models

Qiao Yan, Yuchen Yuan, Xiaowei Hu, Yihan Wang, Jiaqi Xu, Jinpeng Li,, Chi-Wing Fu, Pheng-Ann Heng

PDF

1 Repo

TL;DR

MedHallTune is a large-scale benchmark designed to evaluate and reduce hallucinations in medical vision-language models, improving their reliability for healthcare applications.

Contribution

This work introduces MedHallTune, the first comprehensive benchmark with over 1 million instruction pairs to assess and mitigate hallucinations in medical VLMs.

Findings

01

Fine-tuning with MedHallTune reduces hallucinations in models.

02

Improves models' zero-shot performance on medical VQA tasks.

03

Enhances trustworthiness of medical vision-language models.

Abstract

The increasing use of vision-language models (VLMs) in healthcare applications presents great challenges related to hallucinations, in which the models may generate seemingly plausible results that are in fact incorrect. Such hallucinations can jeopardize clinical decision making, potentially harming the diagnosis and treatments. In this work, we propose MedHallTune, a large-scale benchmark designed specifically to evaluate and mitigate hallucinations in medical VLMs. Comprising over 100,000 images and 1,000,000 instruction pairs, MedHallTune includes both hallucination and non-hallucination samples, each with ground-truth annotations. We conduct a comprehensive evaluation of current medical and general VLMs using MedHallTune, assessing their performance across key metrics, including clinical accuracy, relevance, detail level, and risk level. The experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

russellyq/medhalltune
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.