RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores

Yingshu Li; Yunyi Liu; Lingqiao Liu; Lei Wang; and Luping Zhou

arXiv:2508.15464·cs.CL·August 22, 2025

RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores

Yingshu Li, Yunyi Liu, Lingqiao Liu, Lei Wang, and Luping Zhou

PDF

Open Access

TL;DR

RadReason is an innovative evaluation framework for radiology reports that provides detailed, interpretable sub-scores and explanations, improving accuracy and clinical relevance over existing metrics.

Contribution

RadReason introduces a clinically grounded, explainable evaluation method with adaptive weighting and advantage scaling, advancing report assessment beyond coarse scores.

Findings

01

Outperforms prior offline metrics on ReXVal benchmark

02

Achieves parity with GPT-4 evaluations in accuracy

03

Provides human-readable justifications for scores

Abstract

Evaluating automatically generated radiology reports remains a fundamental challenge due to the lack of clinically grounded, interpretable, and fine-grained metrics. Existing methods either produce coarse overall scores or rely on opaque black-box models, limiting their usefulness in real-world clinical workflows. We introduce RadReason, a novel evaluation framework for radiology reports that not only outputs fine-grained sub-scores across six clinically defined error types, but also produces human-readable justifications that explain the rationale behind each score. Our method builds on Group Relative Policy Optimization and incorporates two key innovations: (1) Sub-score Dynamic Weighting, which adaptively prioritizes clinically challenging error types based on live F1 statistics; and (2) Majority-Guided Advantage Scaling, which adjusts policy gradient updates based on prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Radiology practices and education · Machine Learning in Healthcare