A Data-Centric Approach To Generate Faithful and High Quality Patient   Summaries with Large Language Models

Stefan Hegselmann; Shannon Zejiang Shen; Florian Gierse; Monica; Agrawal; David Sontag; Xiaoyi Jiang

arXiv:2402.15422·cs.CL·June 26, 2024·6 cites

A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Stefan Hegselmann, Shannon Zejiang Shen, Florian Gierse, Monica, Agrawal, David Sontag, Xiaoyi Jiang

PDF

Open Access 1 Repo

TL;DR

This paper explores how large language models can generate accurate and high-quality patient summaries from doctors' notes, emphasizing the importance of training data quality to reduce hallucinations and improve faithfulness.

Contribution

It introduces a new labeling protocol and dataset for hallucinations in medical summaries, demonstrating that training on hallucination-free data reduces errors significantly.

Findings

01

Fine-tuning on hallucination-free data reduces hallucinations in Llama 2 and GPT-4.

02

Common metrics do not reliably measure faithfulness or quality.

03

GPT-4 outperforms baselines in automatic hallucination detection.

Abstract

Patients often face difficulties in understanding their hospitalizations, while healthcare workers have limited resources to provide explanations. In this work, we investigate the potential of large language models to generate patient summaries based on doctors' notes and study the effect of training data on the faithfulness and quality of the generated summaries. To this end, we release (i) a rigorous labeling protocol for errors in medical texts and (ii) a publicly available dataset of annotated hallucinations in 100 doctor-written and 100 generated summaries. We show that fine-tuning on hallucination-free data effectively reduces hallucinations from 2.60 to 1.55 per summary for Llama 2, while preserving relevant information. We observe a similar effect on GPT-4 (0.70 to 0.40), when the few-shot examples are hallucination-free. We also conduct a qualitative evaluation using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stefanhgm/patient_summaries_with_llms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Adam · Softmax · Layer Normalization · Multi-Head Attention