Distilling Large Language Models for Matching Patients to Clinical Trials
Mauro Nievas, Aditya Basu, Yanshan Wang, Hrituraj Singh

TL;DR
This study evaluates the effectiveness of both proprietary and open-source large language models in matching patients to clinical trials, demonstrating that fine-tuned open-source models can match proprietary ones, thus enabling more accessible healthcare applications.
Contribution
It systematically compares proprietary and open-source LLMs for patient-trial matching and introduces a fine-tuning approach with synthetic data to enhance open-source model performance.
Findings
Open-source LLMs, when fine-tuned, perform comparably to proprietary models.
Fine-tuning with synthetic data improves open-source LLM effectiveness.
Released datasets and models support further research in healthcare NLP.
Abstract
The recent success of large language models (LLMs) has paved the way for their adoption in the high-stakes domain of healthcare. Specifically, the application of LLMs in patient-trial matching, which involves assessing patient eligibility against clinical trial's nuanced inclusion and exclusion criteria, has shown promise. Recent research has shown that GPT-3.5, a widely recognized LLM developed by OpenAI, can outperform existing methods with minimal 'variable engineering' by simply comparing clinical trial information against patient summaries. However, there are significant challenges associated with using closed-source proprietary LLMs like GPT-3.5 in practical healthcare applications, such as cost, privacy and reproducibility concerns. To address these issues, this study presents the first systematic examination of the efficacy of both proprietary (GPT-3.5, and GPT-4) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling
MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Adam · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Dropout
