Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches

Namu Park; Giridhar Kaushik Ramachandran; Kevin Lybarger; Fei Xia; Ozlem Uzuner; Meliha Yetisgen; Martin Gunn

arXiv:2511.11867·cs.CL·November 18, 2025

Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches

Namu Park, Giridhar Kaushik Ramachandran, Kevin Lybarger, Fei Xia, Ozlem Uzuner, Meliha Yetisgen, Martin Gunn

PDF

Open Access

TL;DR

This study introduces a new annotated dataset of radiology reports and compares traditional ML models with advanced LLMs for identifying follow-up imaging, showing that optimized LLMs perform near human-level accuracy.

Contribution

The paper provides a novel annotated corpus for follow-up detection and systematically evaluates both traditional ML and large language models on this task.

Findings

01

GPT-4o (Advanced) achieved the highest F1 score of 0.832.

02

Optimized prompts significantly improved LLM reasoning accuracy.

03

Traditional models like LR and SVM performed competitively with LLMs.

Abstract

Large language models (LLMs) have shown considerable promise in clinical natural language processing, yet few domain-specific datasets exist to rigorously evaluate their performance on radiology tasks. In this work, we introduce an annotated corpus of 6,393 radiology reports from 586 patients, each labeled for follow-up imaging status, to support the development and benchmarking of follow-up adherence detection systems. Using this corpus, we systematically compared traditional machine-learning classifiers, including logistic regression (LR), support vector machines (SVM), Longformer, and a fully fine-tuned Llama3-8B-Instruct, with recent generative LLMs. To evaluate generative LLMs, we tested GPT-4o and the open-source GPT-OSS-20B under two configurations: a baseline (Base) and a task-optimized (Advanced) setting that focused inputs on metadata, recommendation sentences, and their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Radiology practices and education · Radiomics and Machine Learning in Medical Imaging