How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting
Parker Seegmiller, Joseph Gatto, Sarah E. Greer, Ganza Belise Isingizwe, Rohan Ray, Timothy E. Burdick, Sarah Masud Preum

TL;DR
This study evaluates how well large language models can generate patient message responses aligned with individual clinicians' preferences, highlighting the challenges and potential strategies for improving their integration into clinical workflows.
Contribution
It introduces a new taxonomy and evaluation framework for assessing LLMs in clinician response drafting, along with an expert-annotated dataset and comprehensive evaluation of adaptation techniques.
Findings
LLMs show variable ability to generate clinician-aligned responses across themes.
Theme-driven adaptation improves LLM performance in most response themes.
Significant epistemic uncertainty exists in aligning LLM outputs with clinician preferences.
Abstract
Large language models (LLMs) show promise in drafting responses to patient portal messages, yet their integration into clinical workflows raises various concerns, including whether they would actually save clinicians time and effort in their portal workload. We investigate LLM alignment with individual clinicians through a comprehensive evaluation of the patient message response drafting task. We develop a novel taxonomy of thematic elements in clinician responses and propose a novel evaluation framework for assessing clinician editing load of LLM-drafted responses at both content and theme levels. We release an expert-annotated dataset and conduct large-scale evaluations of local and commercial LLMs using various adaptation techniques including thematic prompting, retrieval-augmented generation, supervised fine-tuning, and direct preference optimization. Our results reveal substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Artificial Intelligence in Healthcare and Education
