Extrinsically-Focused Evaluation of Omissions in Medical Summarization

Elliot Schumacher; Daniel Rosenthal; Dhruv Naik; Varun Nair; Luladay; Price; Geoffrey Tso; Anitha Kannan

arXiv:2311.08303·cs.CL·November 13, 2024·1 cites

Extrinsically-Focused Evaluation of Omissions in Medical Summarization

Elliot Schumacher, Daniel Rosenthal, Dhruv Naik, Varun Nair, Luladay, Price, Geoffrey Tso, Anitha Kannan

PDF

Open Access 1 Repo

TL;DR

This paper introduces MED-OMIT, a new metric for evaluating omissions in medical summaries generated by large language models, focusing on clinical relevance and agreement with experts.

Contribution

The paper presents MED-OMIT, a novel evaluation metric for medical summarization that quantifies clinically relevant omissions and compares model performance to expert judgment.

Findings

01

MED-OMIT aligns well with clinical experts' assessments

02

GPT-4 and Llama-3.1-405b perform effectively in generating summaries

03

Llama 2 shows comparatively lower performance in the evaluation

Abstract

Large language models (LLMs) have shown promise in safety-critical applications such as healthcare, yet the ability to quantify performance has lagged. An example of this challenge is in evaluating a summary of the patient's medical record. A resulting summary can enable the provider to get a high-level overview of the patient's health status quickly. Yet, a summary that omits important facts about the patient's record can produce a misleading picture. This can lead to negative consequences on medical decision-making. We propose MED-OMIT as a metric to explore this challenge. We focus on using provider-patient history conversations to generate a subjective (a summary of the patient's history) as a case study. We begin by discretizing facts from the dialogue and identifying which are omitted from the subjective. To determine which facts are clinically relevant, we measure the importance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

curai/curai-research
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies

MethodsLLaMA · Focus · Sparse Evolutionary Training