Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report
Jason Holmes, Lian Zhang, Yuzhen Ding, Hongying Feng, Zhengliang Liu,, Tianming Liu, William W. Wong, Sujay A. Vora, Jonathan B. Ashman, Wei Liu

TL;DR
This study benchmarks GPT-4's ability to re-label medical structure names according to AAPM TG-263 standards across three disease sites, demonstrating high accuracy and proposing LLMs as a standardization tool in radiation oncology.
Contribution
The paper introduces a novel benchmark for evaluating LLMs in medical image structure name standardization and demonstrates GPT-4's high accuracy in this task.
Findings
Re-labeling accuracy: 96.0% to 98.5% across sites.
Target volume re-labeling accuracy: up to 100%.
LLMs are promising for standardizing structure names.
Abstract
Purpose: To introduce the concept of using large language models (LLMs) to re-label structure names in accordance with the American Association of Physicists in Medicine (AAPM) Task Group (TG)-263 standard, and to establish a benchmark for future studies to reference. Methods and Materials: The Generative Pre-trained Transformer (GPT)-4 application programming interface (API) was implemented as a Digital Imaging and Communications in Medicine (DICOM) storage server, which upon receiving a structure set DICOM file, prompts GPT-4 to re-label the structure names of both target volumes and normal tissues according to the AAPM TG-263. Three disease sites, prostate, head and neck, and thorax were selected for evaluation. For each disease site category, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50) and 50 patients were randomly selected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Attention Is All You Need · Byte Pair Encoding · Dense Connections · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization · Linear Layer
