EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records
Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul, Cha, Tom Pollard, Alistair Johnson, Edward Choi

TL;DR
EHRCon is a new dataset and framework designed to detect inconsistencies between unstructured clinical notes and structured data in electronic health records, enhancing data reliability and patient safety.
Contribution
The paper introduces EHRCon, a novel dataset with manual annotations for consistency checking, and CheckEHR, a large language model-based framework for verifying data consistency in EHRs.
Findings
CheckEHR shows promising results in few-shot and zero-shot settings.
EHRCon includes 4,101 annotated entities across 105 clinical notes.
Two schema versions increase dataset applicability.
Abstract
Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system designs and human errors, posing serious risks to patient safety. To address this, we developed EHRCon, a new dataset and task specifically designed to ensure data consistency between structured tables and unstructured notes in EHRs. EHRCon was crafted in collaboration with healthcare professionals using the MIMIC-III EHR dataset, and includes manual annotations of 4,101 entities across 105 clinical notes checked against database entries for consistency. EHRCon has two versions, one using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Healthcare
