Radiology-Aware Model-Based Evaluation Metric for Report Generation
Amos Calamida, Farhad Nooralahzadeh, Morteza Rohanian, Koji Fujimoto,, Mizuho Nishio, Michael Krauthammer

TL;DR
This paper introduces a radiology-specific automated evaluation metric for report generation, adapting the COMET architecture and demonstrating moderate to high correlation with established metrics and human judgment.
Contribution
It presents a novel radiology-aware evaluation metric based on COMET, including four trained model checkpoints, one utilizing RadGraph, and shows its effectiveness in correlating with human assessments.
Findings
The metric correlates moderately to highly with BERTscore, BLEU, and CheXbert.
One checkpoint shows high correlation with radiologist human judgment.
The method demonstrates potential as an effective radiology-specific evaluation tool.
Abstract
We propose a new automated evaluation metric for machine-generated radiology reports using the successful COMET architecture adapted for the radiology domain. We train and publish four medically-oriented model checkpoints, including one trained on RadGraph, a radiology knowledge graph. Our results show that our metric correlates moderately to high with established metrics such as BERTscore, BLEU, and CheXbert scores. Furthermore, we demonstrate that one of our checkpoints exhibits a high correlation with human judgment, as assessed using the publicly available annotations of six board-certified radiologists, using a set of 200 reports. We also performed our own analysis gathering annotations with two radiologists on a collection of 100 reports. The results indicate the potential effectiveness of our method as a radiology-specific evaluation metric. The code, data, and model checkpoints…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
