Adapted Large Language Models Can Outperform Medical Experts in Clinical   Text Summarization

Dave Van Veen; Cara Van Uden; Louis Blankemeier; Jean-Benoit; Delbrouck; Asad Aali; Christian Bluethgen; Anuj Pareek; Malgorzata Polacin,; Eduardo Pontes Reis; Anna Seehofnerova; Nidhi Rohatgi; Poonam Hosamani,; William Collins; Neera Ahuja; Curtis P. Langlotz; Jason Hom; Sergios Gatidis,; John Pauly; Akshay S. Chaudhari

arXiv:2309.07430·cs.CL·April 15, 2024·35 cites

Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization

Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit, Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin,, Eduardo Pontes Reis, Anna Seehofnerova, Nidhi Rohatgi, Poonam Hosamani,, William Collins, Neera Ahuja, Curtis P. Langlotz, Jason Hom

PDF

Open Access 1 Repo

TL;DR

This study demonstrates that adapted large language models can outperform medical experts in clinical text summarization tasks, potentially reducing clinicians' documentation burden and improving workflow efficiency.

Contribution

The paper introduces adaptation techniques for eight large language models across four clinical summarization tasks, showing they can surpass medical experts in summary quality.

Findings

01

LLMs achieved higher scores than experts in most cases.

02

Summaries from adapted LLMs were often equivalent or superior to those from clinicians.

03

Safety analysis identified common errors and potential medical harms.

Abstract

Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP), their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Quantitative assessments with syntactic, semantic, and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with ten physicians evaluates summary completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stanfordmimi/clin-summ
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques

MethodsFocus · ALIGN