Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction
Shivali Dalmia, Ananya Mantravadi, Prasanna Desikan

TL;DR
This paper systematically evaluates large language models for extracting post-discharge clinical actions from narrative discharge notes, comparing their performance to supervised models and analyzing annotation challenges.
Contribution
It introduces a two-stage extraction framework and provides a comprehensive assessment of LLMs versus supervised models in clinical action extraction tasks.
Findings
LLMs perform comparably to supervised models on binary action detection.
Supervised models outperform LLMs on multi-label category classification.
Annotation inconsistencies and lack of reasoning annotations hinder model evaluation.
Abstract
The work in this paper evaluates zero-shot and few-shot large language models (LLMs) for safety-critical clinical action extraction using the CLIP discharge-note dataset, with particular emphasis on transitions of care and post-discharge patient safety. To manage the complexity of clinical documentation, we introduce a two-stage extraction framework that decomposes discharge notes, that are written in narrative form, into fine-grained, explicitly actionable clinical tasks through a staged prompting strategy. Our contributions include a systematic assessment of generative LLMs for clinical action extraction, a detailed comparison between general-purpose LLMs and task-specific supervised BERT-based models, and an analysis of annotation inconsistencies across different action categories. We show that contemporary LLMs achieve performance comparable to or exceeding supervised models on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
