Comparative Evaluation of Generative AI Models for Chest Radiograph Report Generation in the Emergency Department
Woo Hyeon Lim, Ji Young Lee, Jong Hyuk Lee, Saehoon Kim, Hyungjin Kim

TL;DR
This study benchmarks various visual language models for chest X-ray report generation, revealing AIRead's superior performance in report quality, clinical acceptability, and hallucination rates compared to other models and radiologists.
Contribution
First comprehensive evaluation of open-source and commercial VLMs for CXR report generation against radiologist reports, highlighting AIRead's strengths and variability among models.
Findings
AIRead had the lowest disagreement rate with radiologists.
AIRead showed the highest clinical acceptability among models.
Hallucinations were rare with AIRead, similar to radiologists.
Abstract
Purpose: To benchmark open-source or commercial medical image-specific VLMs against real-world radiologist-written reports. Methods: This retrospective study included adult patients who presented to the emergency department between January 2022 and April 2025 and underwent same-day CXR and CT for febrile or respiratory symptoms. Reports from five VLMs (AIRead, Lingshu, MAIRA-2, MedGemma, and MedVersa) and radiologist-written reports were randomly presented and blindly evaluated by three thoracic radiologists using four criteria: RADPEER, clinical acceptability, hallucination, and language clarity. Comparative performance was assessed using generalized linear mixed models, with radiologist-written reports treated as the reference. Finding-level analyses were also performed with CT as the reference. Results: A total of 478 patients (median age, 67 years [interquartile range, 50-78]; 282…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Radiology practices and education · COVID-19 diagnosis using AI
