Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems
Michael E. Garcia-Alcoser, Mobina GhojoghNejad, Fakrul Islam Tushar, David Kim, Kyle J. Lafata, Geoffrey D. Rubin, Joseph Y. Lo

TL;DR
This study evaluates the performance of lightweight large language models in automating disease labeling of CT radiology reports across multiple organ systems, demonstrating their superiority over rule-based methods and their potential for clinical application.
Contribution
It introduces the use of open-weight LLMs for zero-shot multi-disease labeling in CT reports, showing they outperform rule-based algorithms and generalize across organ systems.
Findings
Llama-3.1 8B and Gemma-3 27B achieved highest agreement scores.
Lightweight LLMs outperformed rule-based algorithms in macro-F1 scores.
Models generalized well across different datasets and organ systems.
Abstract
Purpose: This study aims to evaluate the effectiveness of large language models (LLMs) in automating disease annotation of CT radiology reports. We compare a rule-based algorithm (RBA), RadBERT, and three lightweight open-weight LLMs for multi-disease labeling of chest, abdomen, and pelvis (CAP) CT reports. Materials and Methods: This retrospective study analyzed 40,833 chest-abdomen-pelvis (CAP) CT reports from 29,540 patients, with 1,789 reports manually annotated across three organ systems. External validation was conducted using the CT RATE dataset. Three open-weight LLMs were tested with zero-shot prompting. Performance was evaluated using Cohen's Kappa () and micro/macro-averaged F1 scores. Results: In the internal test set of 12,197 CAP reports from 8,854 patients, Llama-3.1 8B and Gemma-3 27B showed the highest agreement ( median: 0.87). On the manually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Radiology practices and education · Radiomics and Machine Learning in Medical Imaging
