EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models
Jo\~ao Matos, Jack Gallifant, Jian Pei, A. Ian Wong

TL;DR
EHRmonize is a framework that uses large language models to efficiently abstract medical concepts from electronic health records, significantly reducing annotation time and aiding clinical data processing.
Contribution
The paper introduces EHRmonize, a novel framework leveraging LLMs for medical concept abstraction from EHRs, demonstrating high accuracy and efficiency improvements.
Findings
GPT-4o achieved 97% accuracy in route name identification
82% accuracy in generic drug name extraction
100% accuracy in binary classification of antibiotics
Abstract
Electronic health records (EHRs) contain vast amounts of complex data, but harmonizing and processing this information remains a challenging and costly task requiring significant clinical expertise. While large language models (LLMs) have shown promise in various healthcare applications, their potential for abstracting medical concepts from EHRs remains largely unexplored. We introduce EHRmonize, a framework leveraging LLMs to abstract medical concepts from EHR data. Our study uses medication data from two real-world EHR databases to evaluate five LLMs on two free-text extraction and six binary classification tasks across various prompting strategies. GPT-4o's with 10-shot prompting achieved the highest performance in all tasks, accompanied by Claude-3.5-Sonnet in a subset of tasks. GPT-4o achieved an accuracy of 97% in identifying generic route names, 82% for generic drug names, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Machine Learning in Healthcare
