CRG Score: A Distribution-Aware Clinical Metric for Radiology Report Generation
Ibrahim Ethem Hamamci, Sezgin Er, Suprosanna Shit, Hadrien Reynaud, Bernhard Kainz, Bjoern Menze

TL;DR
The paper introduces the CRG Score, a novel distribution-aware metric for evaluating radiology report generation that emphasizes clinical relevance and balances label distribution for fairer assessment.
Contribution
It proposes the CRG Score, a new evaluation metric that explicitly accounts for clinical abnormalities and adapts to label distribution, improving robustness and clinical alignment.
Findings
CRG Score effectively captures clinical correctness in report evaluation.
It balances penalties based on label distribution for fairer assessments.
Supports both binary and structured labels for flexibility.
Abstract
Evaluating long-context radiology report generation is challenging. NLG metrics fail to capture clinical correctness, while LLM-based metrics often lack generalizability. Clinical accuracy metrics are more relevant but are sensitive to class imbalance, frequently favoring trivial predictions. We propose the CRG Score, a distribution-aware and adaptable metric that evaluates only clinically relevant abnormalities explicitly described in reference reports. CRG supports both binary and structured labels (e.g., type, location) and can be paired with any LLM for feature extraction. By balancing penalties based on label distribution, it enables fairer, more robust evaluation and serves as a clinically aligned reward function.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
