Benchmarking LLMs and SLMs for patient reported outcomes

Matteo Marengo; Jarod L\'evy; Jean-Emmanuel Bibault

arXiv:2412.16291·cs.AI·December 24, 2024

Benchmarking LLMs and SLMs for patient reported outcomes

Matteo Marengo, Jarod L\'evy, Jean-Emmanuel Bibault

PDF

Open Access

TL;DR

This paper compares large language models (LLMs) and smaller, local models (SLMs) in summarizing patient-reported outcomes for radiotherapy, emphasizing privacy, accuracy, and clinical utility.

Contribution

It provides a comprehensive benchmark of SLMs versus LLMs for medical summarization tasks, highlighting their respective strengths and limitations.

Findings

01

SLMs show promise for privacy-preserving medical summarization.

02

LLMs outperform SLMs in accuracy but pose privacy concerns.

03

Both models have limitations in high-stakes clinical applications.

Abstract

LLMs have transformed the execution of numerous tasks, including those in the medical domain. Among these, summarizing patient-reported outcomes (PROs) into concise natural language reports is of particular interest to clinicians, as it enables them to focus on critical patient concerns and spend more time in meaningful discussions. While existing work with LLMs like GPT-4 has shown impressive results, real breakthroughs could arise from leveraging SLMs as they offer the advantage of being deployable locally, ensuring patient data privacy and compliance with healthcare regulations. This study benchmarks several SLMs against LLMs for summarizing patient-reported Q\&A forms in the context of radiotherapy. Using various metrics, we evaluate their precision and reliability. The findings highlight both the promise and limitations of SLMs for high-stakes medical tasks, fostering more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsClinical practice guidelines implementation

MethodsLinear Layer · Dense Connections · Residual Connection · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Dropout · Softmax