Iterative Prompt Refinement for Radiation Oncology Symptom Extraction   Using Teacher-Student Large Language Models

Reza Khanmohammadi; Ahmed I Ghanem; Kyle Verdecchia; Ryan Hall,; Mohamed Elshaikh; Benjamin Movsas; Hassan Bagher-Ebadian; Indrin Chetty,; Mohammad M. Ghassemi; Kundan Thind

arXiv:2402.04075·cs.CL·February 7, 2024·1 cites

Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models

Reza Khanmohammadi, Ahmed I Ghanem, Kyle Verdecchia, Ryan Hall,, Mohamed Elshaikh, Benjamin Movsas, Hassan Bagher-Ebadian, Indrin Chetty,, Mohammad M. Ghassemi, Kundan Thind

PDF

Open Access

TL;DR

This paper presents an iterative teacher-student framework using LLMs to enhance symptom extraction accuracy from clinical notes in prostate cancer radiotherapy, demonstrating significant performance improvements.

Contribution

Introduces a novel iterative prompt refinement method with teacher-student LLMs for improved clinical note symptom extraction in radiation oncology.

Findings

01

Accuracy improved from 0.51 to 0.71 for single symptoms

02

Precision increased from 0.52 to 0.82 for single symptoms

03

F1 score rose from 0.49 to 0.73 for single symptoms

Abstract

This study introduces a novel teacher-student architecture utilizing Large Language Models (LLMs) to improve prostate cancer radiotherapy symptom extraction from clinical notes. Mixtral, the student model, initially extracts symptoms, followed by GPT-4, the teacher model, which refines prompts based on Mixtral's performance. This iterative process involved 294 single symptom clinical notes across 12 symptoms, with up to 16 rounds of refinement per epoch. Results showed significant improvements in extracting symptoms from both single and multi-symptom notes. For 59 single symptom notes, accuracy increased from 0.51 to 0.71, precision from 0.52 to 0.82, recall from 0.52 to 0.72, and F1 score from 0.49 to 0.73. In 375 multi-symptom notes, accuracy rose from 0.24 to 0.43, precision from 0.6 to 0.76, recall from 0.24 to 0.43, and F1 score from 0.20 to 0.44. These results demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsAttention Is All You Need · Residual Connection · Dropout · Layer Normalization · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Softmax · Absolute Position Encodings · Linear Layer