Large Language Model-Based Generation of Discharge Summaries
Tiago Rodrigues, Carla Teixeira Lopes

TL;DR
This study evaluates various large language models for automatically generating discharge summaries from medical notes, finding proprietary models like Gemini excel in accuracy and utility, though challenges like hallucinations remain.
Contribution
It compares open-source and proprietary LLMs on discharge summary generation, highlighting the superior performance of proprietary models with one-shot prompting.
Findings
Proprietary models outperform open-source models in summary accuracy.
Fine-tuning improves open-source model performance.
Human evaluation confirms the clinical utility of proprietary model summaries.
Abstract
Discharge Summaries are documents written by medical professionals that detail a patient's visit to a care facility. They contain a wealth of information crucial for patient care, and automating their generation could significantly reduce the effort required from healthcare professionals, minimize errors, and ensure that critical patient information is easily accessible and actionable. In this work, we explore the use of five Large Language Models on this task, from open-source models (Mistral, Llama 2) to proprietary systems (GPT-3, GPT-4, Gemini 1.5 Pro), leveraging MIMIC-III summaries and notes. We evaluate them using exact-match, soft-overlap, and reference-free metrics. Our results show that proprietary models, particularly Gemini with one-shot prompting, outperformed others, producing summaries with the highest similarity to the gold-standard ones. Open-source models, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Artificial Intelligence in Healthcare and Education
