Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding
Kyunghoon Hur, Jiyoung Lee, Jungwoo Oh, Wesley Price, Young-Hak Kim,, Edward Choi

TL;DR
This paper introduces DescEmb, a text-based embedding framework for electronic health records that unifies heterogeneous systems and improves transferability across hospitals using neural language models.
Contribution
The paper presents a novel code-agnostic embedding method for EHR that leverages textual descriptions, enabling unified modeling across diverse systems.
Findings
DescEmb outperforms traditional code-based embeddings.
Effective in zero-shot transfer between hospitals.
Supports training a single model for multiple EHR datasets.
Abstract
EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals. To overcome this problem, we introduceDescription-based Embedding,DescEmb, a code-agnostic representation learning framework forEHR. DescEmb takes advantage of the flexibil-ity of neural language understanding models toembed clinical events using their textual descrip-tions rather than directly mapping each event toa dedicated embedding. DescEmb outperformedtraditional code-based embedding in extensiveexperiments, especially in a zero-shot transfertask (one hospital to another), and was able totrain a single unified model for heterogeneousEHR datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques
