CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation
Mohammed Baharoon, Thibault Heintz, Siavash Raissi, Mahmoud Alabbad, Mona Alhammad, Hassan AlOmaish, Sung Eun Kim, Oishi Banerjee, Pranav Rajpurkar

TL;DR
CRIMSON is a clinically-grounded evaluation metric for chest X-ray report generation that incorporates full clinical context, error taxonomy, and severity weighting, showing strong alignment with radiologist judgments.
Contribution
It introduces a novel, clinically-informed evaluation framework for radiology reports that accounts for diagnostic importance and error severity, outperforming prior metrics.
Findings
CRIMSON aligns well with radiologist-annotated error counts (Kendalls tau=0.61-0.71).
It shows consistent agreement with expert judgments in RadJudge scenarios.
Achieves strong correlation with radiologist preferences in RadPref.
Abstract
We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation that assesses reports based on diagnostic correctness, contextual relevance, and patient safety. Unlike prior metrics, CRIMSON incorporates full clinical context, including patient age, indication, and guideline-based decision rules, and prevents normal or clinically insignificant findings from exerting disproportionate influence on the overall score. The framework categorizes errors into a comprehensive taxonomy covering false findings, missing findings, and eight attribute-level errors (e.g., location, severity, measurement, and diagnostic overinterpretation). Each finding is assigned a clinical significance level (urgent, actionable non-urgent, non-actionable, or expected/benign), based on a guideline developed in collaboration with attending cardiothoracic radiologists, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiology practices and education · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
