Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes
Darren Liu, Cheng Ding, Delgersuren Bold, Monique Bouvier, Jiaying Lu,, Benjamin Shickel, Craig S. Jabaley, Wenhui Zhang, Soojin Park, Michael J., Young, Mark S. Wainwright, Gilles Clermont, Parisa Rashidi, Eric S., Rosenthal, Laurie Dimisko, Ran Xiao, Joo Heung Yoon, Carl Yang

TL;DR
This study systematically evaluates large language models' ability to understand complex clinical notes in adult critical care, highlighting GPT-4's superior performance and proposing a comprehensive evaluation framework for healthcare applications.
Contribution
Introduces a novel, comprehensive evaluation framework for LLMs in healthcare, incorporating clinician annotations and benchmarking across complex clinical contexts.
Findings
GPT-4 outperforms other LLMs in clinical note understanding.
Prompting strategies significantly improve LLM performance.
The evaluation framework is effective for assessing LLMs in medical domains.
Abstract
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in real-world healthcare settings. Objective: We sought to evaluate the performance of LLMs in the complex clinical context of adult critical care medicine using systematic and comprehensible analytic methods, including clinician annotation and adjudication. Methods: We investigated the performance of three general LLMs in understanding and processing real-world clinical notes. Concepts from 150 clinical notes were identified by MetaMap and then labeled by 9 clinicians. Each LLM's proficiency was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Discriminative Fine-Tuning · Cosine Annealing · Byte Pair Encoding · Adam · Label Smoothing · Linear Layer · Multi-Head Attention
