Do Physicians Know How to Prompt? The Need for Automatic Prompt Optimization Help in Clinical Note Generation
Zonghai Yao, Ahmed Jaafar, Beining Wang, Zhichao Yang, Hong Yu

TL;DR
This paper introduces an Automatic Prompt Optimization framework to enhance large language model performance in clinical note generation, demonstrating improved consistency and quality with minimal expert input.
Contribution
The study presents a novel APO framework that refines prompts for LLMs in clinical settings, combining automated optimization with expert customization for better results.
Findings
GPT4 APO outperforms other models in prompt quality.
Experts prefer their own modifications despite APO improvements.
Two-phase optimization enhances consistency and personalization.
Abstract
This study examines the effect of prompt engineering on the performance of Large Language Models (LLMs) in clinical note generation. We introduce an Automatic Prompt Optimization (APO) framework to refine initial prompts and compare the outputs of medical experts, non-medical experts, and APO-enhanced GPT3.5 and GPT4. Results highlight GPT4 APO's superior performance in standardizing prompt quality across clinical note sections. A human-in-the-loop approach shows that experts maintain content quality post-APO, with a preference for their own modifications, suggesting the value of expert customization. We recommend a two-phase optimization process, leveraging APO-GPT4 for consistency and expert input for personalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
