A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models
Stefan Hegselmann, Shannon Zejiang Shen, Florian Gierse, Monica, Agrawal, David Sontag, Xiaoyi Jiang

TL;DR
This paper explores how large language models can generate accurate and high-quality patient summaries from doctors' notes, emphasizing the importance of training data quality to reduce hallucinations and improve faithfulness.
Contribution
It introduces a new labeling protocol and dataset for hallucinations in medical summaries, demonstrating that training on hallucination-free data reduces errors significantly.
Findings
Fine-tuning on hallucination-free data reduces hallucinations in Llama 2 and GPT-4.
Common metrics do not reliably measure faithfulness or quality.
GPT-4 outperforms baselines in automatic hallucination detection.
Abstract
Patients often face difficulties in understanding their hospitalizations, while healthcare workers have limited resources to provide explanations. In this work, we investigate the potential of large language models to generate patient summaries based on doctors' notes and study the effect of training data on the faithfulness and quality of the generated summaries. To this end, we release (i) a rigorous labeling protocol for errors in medical texts and (ii) a publicly available dataset of annotated hallucinations in 100 doctor-written and 100 generated summaries. We show that fine-tuning on hallucination-free data effectively reduces hallucinations from 2.60 to 1.55 per summary for Llama 2, while preserving relevant information. We observe a similar effect on GPT-4 (0.70 to 0.40), when the few-shot examples are hallucination-free. We also conduct a qualitative evaluation using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Adam · Softmax · Layer Normalization · Multi-Head Attention
