Extrinsically-Focused Evaluation of Omissions in Medical Summarization
Elliot Schumacher, Daniel Rosenthal, Dhruv Naik, Varun Nair, Luladay, Price, Geoffrey Tso, Anitha Kannan

TL;DR
This paper introduces MED-OMIT, a new metric for evaluating omissions in medical summaries generated by large language models, focusing on clinical relevance and agreement with experts.
Contribution
The paper presents MED-OMIT, a novel evaluation metric for medical summarization that quantifies clinically relevant omissions and compares model performance to expert judgment.
Findings
MED-OMIT aligns well with clinical experts' assessments
GPT-4 and Llama-3.1-405b perform effectively in generating summaries
Llama 2 shows comparatively lower performance in the evaluation
Abstract
Large language models (LLMs) have shown promise in safety-critical applications such as healthcare, yet the ability to quantify performance has lagged. An example of this challenge is in evaluating a summary of the patient's medical record. A resulting summary can enable the provider to get a high-level overview of the patient's health status quickly. Yet, a summary that omits important facts about the patient's record can produce a misleading picture. This can lead to negative consequences on medical decision-making. We propose MED-OMIT as a metric to explore this challenge. We focus on using provider-patient history conversations to generate a subjective (a summary of the patient's history) as a case study. We begin by discretizing facts from the dialogue and identifying which are omitted from the subjective. To determine which facts are clinically relevant, we measure the importance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsLLaMA · Focus · Sparse Evolutionary Training
