LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation
Zilong Wang, Xufang Luo, Xinyang Jiang, Dongsheng Li, Lili Qiu

TL;DR
This paper introduces a novel LLM-based evaluation framework for radiology report generation, achieving radiologist-level assessment accuracy and creating a smaller, accessible model through knowledge distillation.
Contribution
It proposes a new LLM-based evaluation metric for radiology reports and develops a compact model with comparable performance, enhancing practicality and accessibility.
Findings
GPT-4-based metric matches radiologist evaluation consistency
Constructed a dataset using LLM evaluations for training
Distilled smaller model achieves GPT-4 level evaluation capabilities
Abstract
Evaluating generated radiology reports is crucial for the development of radiology AI, but existing metrics fail to reflect the task's clinical requirements. This study proposes a novel evaluation framework using large language models (LLMs) to compare radiology reports for assessment. We compare the performance of various LLMs and demonstrate that, when using GPT-4, our proposed metric achieves evaluation consistency close to that of radiologists. Furthermore, to reduce costs and improve accessibility, making this method practical, we construct a dataset using LLM evaluation results and perform knowledge distillation to train a smaller model. The distilled model achieves evaluation capabilities comparable to GPT-4. Our framework and distilled model offer an accessible and efficient evaluation method for radiology report generation, facilitating the development of more clinically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Radiomics and Machine Learning in Medical Imaging · Biomedical Text Mining and Ontologies
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing
