Impact of Large Language Model Assistance on Radiologists’ Diagnostic Performance for Brain Tumors by Experience Level
Chae Won Song, Byung Hyun Baek, Seul Kee Kim, Woong Yoon, Yun Young Lee, Ilwoo Park, Jae Hyun Park, Seol Bin Park, In Woo Choi

TL;DR
This study shows that large language models can help radiologists and trainees improve brain tumor diagnoses, especially for trainees.
Contribution
The novel finding is that LLM assistance significantly improves trainees' diagnostic accuracy and expands differential considerations.
Findings
LLMs like Claude 3.5 Sonnet and ChatGPT-4o achieved high top-three differential diagnostic accuracy comparable to radiologists.
Trainees' diagnostic accuracy improved significantly with LLM assistance in both primary and differential diagnoses.
Radiologists' top-three differential accuracy improved notably after receiving LLM-generated diagnoses.
Abstract
Background: Large language models (LLMs) may assist radiologists in interpreting brain tumor MRI. We compared the diagnostic accuracy of ChatGPT-4o and Claude 3.5 Sonnet with that of board-certified radiologists and trainees, and evaluated whether LLM assistance could enhance diagnostic performance. Methods: A total of 127 histologically confirmed brain tumor cases were included. Two LLMs analyzed representative MRI images together with structured radiologic reports, whereas two board-certified radiologists and three trainees reviewed representative images with basic demographic information only. All participants generated up to three differential diagnoses per case. The accuracy of the primary diagnosis and the accuracy of the top-three differential diagnoses were calculated and compared. Following the initial readings, LLM-generated differential diagnoses were provided to the readers,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Radiomics and Machine Learning in Medical Imaging · Radiology practices and education
