DR.EHR: Dense Retrieval for Electronic Health Record with Knowledge Injection and Synthetic Data
Zhengyun Zhao, Huaiyuan Ying, Yue Zhong, Sheng Yu

TL;DR
This paper presents DR.EHR, a dense retrieval model tailored for electronic health records that leverages medical knowledge injection and synthetic data generation, significantly improving retrieval accuracy in clinical contexts.
Contribution
The paper introduces a novel two-stage training pipeline for EHR retrieval, integrating biomedical knowledge graphs and large language models, with two model variants achieving state-of-the-art results.
Findings
DR.EHR outperforms existing dense retrievers on the CliniQ benchmark.
Knowledge injection and synthetic data generation enhance retrieval performance.
Models demonstrate strong generalization on EHR QA datasets.
Abstract
Electronic Health Records (EHRs) are pivotal in clinical practices, yet their retrieval remains a challenge mainly due to semantic gap issues. Recent advancements in dense retrieval offer promising solutions but existing models, both general-domain and biomedical-domain, fall short due to insufficient medical knowledge or mismatched training corpora. This paper introduces \texttt{DR.EHR}, a series of dense retrieval models specifically tailored for EHR retrieval. We propose a two-stage training pipeline utilizing MIMIC-IV discharge summaries to address the need for extensive medical knowledge and large-scale training data. The first stage involves medical entity extraction and knowledge injection from a biomedical knowledge graph, while the second stage employs large language models to generate diverse training data. We train two variants of \texttt{DR.EHR}, with 110M and 7B parameters,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare
