Hybrid Student-Teacher Large Language Model Refinement for Cancer   Toxicity Symptom Extraction

Reza Khanmohammadi; Ahmed I. Ghanem; Kyle Verdecchia; Ryan Hall,; Mohamed Elshaikh; Benjamin Movsas; Hassan Bagher-Ebadian; Bing Luo; Indrin J.; Chetty; Tuka Alhanai; Kundan Thind; and Mohammad M. Ghassemi

arXiv:2408.04775·cs.CL·August 12, 2024

Hybrid Student-Teacher Large Language Model Refinement for Cancer Toxicity Symptom Extraction

Reza Khanmohammadi, Ahmed I. Ghanem, Kyle Verdecchia, Ryan Hall,, Mohamed Elshaikh, Benjamin Movsas, Hassan Bagher-Ebadian, Bing Luo, Indrin J., Chetty, Tuka Alhanai, Kundan Thind, and Mohammad M. Ghassemi

PDF

Open Access

TL;DR

This paper presents a novel iterative refinement method using a student-teacher architecture to improve compact LLMs for cancer toxicity symptom extraction, achieving high accuracy at significantly reduced costs.

Contribution

It introduces a dynamic selection strategy combining prompt refinement, RAG, and fine-tuning for optimizing small LLMs in clinical symptom extraction tasks.

Findings

01

RAG method significantly improved accuracy scores.

02

Models achieved ~0.20 accuracy increase on test set.

03

Refinement cost was 45-79 times lower than GPT-4o.

Abstract

Large Language Models (LLMs) offer significant potential for clinical symptom extraction, but their deployment in healthcare settings is constrained by privacy concerns, computational limitations, and operational costs. This study investigates the optimization of compact LLMs for cancer toxicity symptom extraction using a novel iterative refinement approach. We employ a student-teacher architecture, utilizing Zephyr-7b-beta and Phi3-mini-128 as student models and GPT-4o as the teacher, to dynamically select between prompt refinement, Retrieval-Augmented Generation (RAG), and fine-tuning strategies. Our experiments on 294 clinical notes covering 12 post-radiotherapy toxicity symptoms demonstrate the effectiveness of this approach. The RAG method proved most efficient, improving average accuracy scores from 0.32 to 0.73 for Zephyr-7b-beta and from 0.40 to 0.87 for Phi3-mini-128 during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Safety Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Softmax · Dense Connections · Dropout · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay