Exploring the Capabilities and Limitations of Large Language Models for Radiation Oncology Decision Support
Florian Putz, Marlen Haderleina, Sebastian Lettmaier, Sabine Semrau,, Rainer Fietkau, Yixing Huang

TL;DR
This study evaluates GPT-4's performance in radiation oncology decision support, demonstrating high accuracy in specialized exams and structure labeling, highlighting its potential and current limitations in clinical applications.
Contribution
It provides a comprehensive assessment of GPT-4's capabilities and limitations in radiation oncology, including physics knowledge, clinical exams, and structure labeling tasks.
Findings
GPT-4 outperforms other LLMs in radiation oncology physics questions.
Achieved 74.57% accuracy on the TXIT exam.
Attained over 96% accuracy in structure re-labeling tasks.
Abstract
Thanks to the rapidly evolving integration of LLMs into decision-support tools, a significant transformation is happening across large-scale systems. Like other medical fields, the use of LLMs such as GPT-4 is gaining increasing interest in radiation oncology as well. An attempt to assess GPT-4's performance in radiation oncology was made via a dedicated 100-question examination on the highly specialized topic of radiation oncology physics, revealing GPT-4's superiority over other LLMs. GPT-4's performance on a broader field of clinical radiation oncology is further benchmarked by the ACR Radiation Oncology In-Training (TXIT) exam where GPT-4 achieved a high accuracy of 74.57%. Its performance on re-labelling structure names in accordance with the AAPM TG-263 report has also been benchmarked, achieving above 96% accuracies. Such studies shed light on the potential of LLMs in radiation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Absolute Position Encodings · Softmax · Linear Layer · Adam · Residual Connection · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing
