FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores
Alyssa Huang, Oishi Banerjee, Kay Wu, Eduardo Pontes Reis, Pranav, Rajpurkar

TL;DR
FineRadScore is an automated, line-by-line evaluation tool for chest X-ray reports that uses a large language model to assess correction severity and generate explanations, aligning well with radiologist judgments.
Contribution
This work introduces FineRadScore, a novel LLM-based metric that quantifies report differences with severity scores and explanations, improving automated evaluation of CXR reports.
Findings
FineRadScore's corrections match radiologist opinions.
It aligns with radiologists and state-of-the-art metrics in report quality assessment.
Provides detailed, explainable evaluation of generated reports.
Abstract
The current gold standard for evaluating generated chest x-ray (CXR) reports is through radiologist annotations. However, this process can be extremely time-consuming and costly, especially when evaluating large numbers of reports. In this work, we present FineRadScore, a Large Language Model (LLM)-based automated evaluation metric for generated CXR reports. Given a candidate report and a ground-truth report, FineRadScore gives the minimum number of line-by-line corrections required to go from the candidate to the ground-truth report. Additionally, FineRadScore provides an error severity rating with each correction and generates comments explaining why the correction was needed. We demonstrate that FineRadScore's corrections and error severity scores align with radiologist opinions. We also show that, when used to judge the quality of the report as a whole, FineRadScore aligns with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiology practices and education · Radiomics and Machine Learning in Medical Imaging · Topic Modeling
MethodsALIGN
