ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics
Oishi Banerjee, Agustina Saenz, Kay Wu, Warren Clements, Adil Zia,, Dominic Buensalido, Helen Kavnoudias, Alain S. Abi-Ghanem, Nour El Ghawi,, Cibele Luna, Patricia Castillo, Khaled Al-Surimi, Rayyan A. Daghistani,, Yuh-Min Chen, Heng-sheng Chao, Lars Heiliger, Moon Kim

TL;DR
ReXamine-Global is a framework that tests the robustness and generalizability of radiology report evaluation metrics across diverse hospital sites and styles, revealing significant gaps in their reliability.
Contribution
We introduce ReXamine-Global, a novel LLM-powered framework for systematically evaluating the consistency of report metrics across multiple hospitals and styles, highlighting their limitations.
Findings
Existing metrics show significant variability across sites.
ReXamine-Global identifies metrics sensitive to reporting style.
The framework guides the development of more robust evaluation metrics.
Abstract
Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First, our method tests whether a metric is undesirably sensitive to reporting style, providing different scores depending on whether AI-generated reports are stylistically similar to ground-truth reports or not. Second, our method measures whether a metric reliably agrees with experts, or whether metric and expert scores of AI-generated report quality diverge for some sites. Using 240 reports from 6 hospitals around the world, we apply ReXamine-Global to 7 established report evaluation metrics and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Biomedical Text Mining and Ontologies · Topic Modeling
