Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology

Meiyun Cao; Shaw Hu; Jason Sharp; Edward Clouser; Jason Holmes; Linda L. Lam; Xiaoning Ding; Diego Santos Toesca; Wendy S. Lindholm; Samir H. Patel; Sujay A. Vora; Peilong Wang; and Wei Liu

arXiv:2501.16309·physics.med-ph·November 11, 2025

Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology

Meiyun Cao, Shaw Hu, Jason Sharp, Edward Clouser, Jason Holmes, Linda L. Lam, Xiaoning Ding, Diego Santos Toesca, Wendy S. Lindholm, Samir H. Patel, Sujay A. Vora, Peilong Wang, and Wei Liu

PDF

Open Access

TL;DR

This study evaluates the use of a large language model, Llama 3.1 405B, to automate summarization of CT simulation orders in radiation oncology, showing high accuracy and potential workflow improvements.

Contribution

It demonstrates the effectiveness of a specific LLM in accurately summarizing CT orders, reducing therapist workload and enhancing consistency.

Findings

01

98% accuracy in summary generation

02

Improved consistency and readability of summaries

03

Consistent performance across different treatment groups

Abstract

Purpose: This study aims to use a large language model (LLM) to automate the generation of summaries from the CT simulation orders and evaluate its performance. Materials and Methods: A total of 607 CT simulation orders for patients were collected from the Aria database at our institution. A locally hosted Llama 3.1 405B model, accessed via the Application Programming Interface (API) service, was used to extract keywords from the CT simulation orders and generate summaries. The downloaded CT simulation orders were categorized into seven groups based on treatment modalities and disease sites. For each group, a customized instruction prompt was developed collaboratively with therapists to guide the Llama 3.1 405B model in generating summaries. The ground truth for the corresponding summaries was manually derived by carefully reviewing each CT simulation order and subsequently verified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Radiomics and Machine Learning in Medical Imaging · Natural Language Processing Techniques

Methodstravel james · Adaptive Richard's Curve Weighted Activation · LLaMA