GREEN: Generative Radiology Report Evaluation and Error Notation
Sophie Ostmeier, Justin Xu, Zhihong Chen, Maya Varma, Louis, Blankemeier, Christian Bluethgen, Arne Edward Michalson, Michael Moseley,, Curtis Langlotz, Akshay S Chaudhari, Jean-Benoit Delbrouck

TL;DR
GREEN is a new evaluation metric for radiology reports that uses language models to identify, explain, and score clinically significant errors, improving alignment with expert assessments and providing interpretable feedback.
Contribution
The paper introduces GREEN, a lightweight, open-source metric leveraging language models for factual correctness evaluation and error explanation in radiology reports, outperforming existing metrics.
Findings
GREEN correlates better with expert error counts.
GREEN aligns more closely with expert preferences.
GREEN provides interpretable error explanations.
Abstract
Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GREEN (Generative Radiology Report Evaluation and Error Notation), a radiology report generation metric that leverages the natural language understanding of language models to identify and explain clinically significant errors in candidate reports, both quantitatively and qualitatively. Compared to current metrics, GREEN offers: 1) a score aligned with expert preferences, 2) human interpretable explanations of clinically significant errors, enabling feedback loops with end-users, and 3) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiology practices and education · Radiation Dose and Imaging · Artificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Adam
