Standardizing Longitudinal Radiology Report Evaluation via Large Language Model Annotation
Xinyi Wang, Grazziela Figueredo, Ruizhe Li, Xin Chen

TL;DR
This paper introduces an LLM-based pipeline for automatically annotating longitudinal information in radiology reports, enabling standardized evaluation of report generation models and outperforming existing annotation methods.
Contribution
The study develops a novel LLM-based annotation pipeline for longitudinal radiology report analysis, creating a large benchmark dataset and improving annotation accuracy over prior methods.
Findings
Qwen2.5-32B was selected for annotation due to its efficiency and performance.
The annotated dataset enabled evaluation of seven report generation models.
The LLM-based method achieved 11.3% and 5.3% higher F1-scores in detection and tracking.
Abstract
Longitudinal information in radiology reports refers to the sequential tracking of findings across multiple examinations over time, which is crucial for monitoring disease progression and guiding clinical decisions. Many recent automated radiology report generation methods are designed to capture longitudinal information; however, validating their performance is challenging. There is no proper tool to consistently label temporal changes in both ground-truth and model-generated texts for meaningful comparisons. Existing annotation methods are typically labor-intensive, relying on the use of manual lexicons and rules. Complex rules are closed-source, domain specific and hard to adapt, whereas overly simple ones tend to miss essential specialised information. Large language models (LLMs) offer a promising annotation alternative, as they are capable of capturing nuanced linguistic patterns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Radiology practices and education
