EHRCon: Dataset for Checking Consistency between Unstructured Notes and   Structured Tables in Electronic Health Records

Yeonsu Kwon; Jiho Kim; Gyubok Lee; Seongsu Bae; Daeun Kyung; Wonchul; Cha; Tom Pollard; Alistair Johnson; Edward Choi

arXiv:2406.16341·cs.CL·December 31, 2024

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul, Cha, Tom Pollard, Alistair Johnson, Edward Choi

PDF

Open Access 1 Repo 1 Video

TL;DR

EHRCon is a new dataset and framework designed to detect inconsistencies between unstructured clinical notes and structured data in electronic health records, enhancing data reliability and patient safety.

Contribution

The paper introduces EHRCon, a novel dataset with manual annotations for consistency checking, and CheckEHR, a large language model-based framework for verifying data consistency in EHRs.

Findings

01

CheckEHR shows promising results in few-shot and zero-shot settings.

02

EHRCon includes 4,101 annotated entities across 105 clinical notes.

03

Two schema versions increase dataset applicability.

Abstract

Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system designs and human errors, posing serious risks to patient safety. To address this, we developed EHRCon, a new dataset and task specifically designed to ensure data consistency between structured tables and unstructured notes in EHRs. EHRCon was crafted in collaboration with healthcare professionals using the MIMIC-III EHR dataset, and includes manual annotations of 4,101 entities across 105 clinical notes checked against database entries for consistency. EHRCon has two versions, one using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dustn1259/ehrcon
pytorchOfficial

Videos

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records· slideslive

Taxonomy

TopicsMachine Learning in Healthcare