EHRmonize: A Framework for Medical Concept Abstraction from Electronic   Health Records using Large Language Models

Jo\~ao Matos; Jack Gallifant; Jian Pei; A. Ian Wong

arXiv:2407.00242·cs.CL·July 2, 2024

EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

Jo\~ao Matos, Jack Gallifant, Jian Pei, A. Ian Wong

PDF

Open Access 1 Repo 1 Datasets

TL;DR

EHRmonize is a framework that uses large language models to efficiently abstract medical concepts from electronic health records, significantly reducing annotation time and aiding clinical data processing.

Contribution

The paper introduces EHRmonize, a novel framework leveraging LLMs for medical concept abstraction from EHRs, demonstrating high accuracy and efficiency improvements.

Findings

01

GPT-4o achieved 97% accuracy in route name identification

02

82% accuracy in generic drug name extraction

03

100% accuracy in binary classification of antibiotics

Abstract

Electronic health records (EHRs) contain vast amounts of complex data, but harmonizing and processing this information remains a challenging and costly task requiring significant clinical expertise. While large language models (LLMs) have shown promise in various healthcare applications, their potential for abstracting medical concepts from EHRs remains largely unexplored. We introduce EHRmonize, a framework leveraging LLMs to abstract medical concepts from EHR data. Our study uses medication data from two real-world EHR databases to evaluate five LLMs on two free-text extraction and six binary classification tasks across various prompting strategies. GPT-4o's with 10-shot prompting achieved the highest performance in all tasks, accompanied by Claude-3.5-Sonnet in a subset of tasks. GPT-4o achieved an accuracy of 97% in identifying generic route names, 82% for generic drug names, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aiwonglab/ehrmonize
noneOfficial

Datasets

AIWongLab/ehrmonize
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Machine Learning in Healthcare