HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation

Haoyu Wang; Zitong Li

arXiv:2605.20469·cs.CV·May 21, 2026

HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation

Haoyu Wang, Zitong Li

PDF

TL;DR

HalluCXR introduces a benchmark for evaluating hallucinations in medical vision-language models, revealing high hallucination rates and proposing detection and mitigation strategies for safer clinical use.

Contribution

The paper presents a comprehensive benchmark, an annotation taxonomy, and ensemble mitigation methods to address hallucinations in medical VLMs for chest radiograph interpretation.

Findings

01

61.9--82.3% of outputs contain hallucinations

02

Normal radiographs attract the most severe hallucinations

03

Ensemble methods reduce hallucinations by up to 84.8%

Abstract

Vision-language models (VLMs) are increasingly used for medical image interpretation, yet they frequently hallucinate, generating clinically plausible but factually incorrect findings that pose direct patient safety risks. We introduce HalluCXR, a benchmark evaluating six architecturally diverse VLMs across 856 stratified MIMIC-CXR chest radiographs and three query types, yielding 15,408 model evaluations. An eight-category hallucination taxonomy with clinical severity ratings and a two-layer detection pipeline are validated against 250 human annotations (auto-detection F1=0.959; LLM judge F1=0.907). We find that 61.9--82.3% of outputs contain hallucinations, with clinically dangerous errors in up to 80.2%. Three key patterns emerge: normal radiographs paradoxically attract the most severe hallucinations, common findings are systematically over-fabricated while rare findings go…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.