Semantic Decomposition Improves Learning of Large Language Models on EHR   Data

David A. Bloore; Romane Gauriau; Anna L. Decker; Jacob Oppenheim

arXiv:2212.06040·cs.CL·December 13, 2022

Semantic Decomposition Improves Learning of Large Language Models on EHR Data

David A. Bloore, Romane Gauriau, Anna L. Decker, Jacob Oppenheim

PDF

Open Access

TL;DR

This paper introduces H-BERT, a novel method that decomposes hierarchical medical codes in EHR data using semantic units and graph structures, significantly enhancing disease prediction accuracy and patient representation.

Contribution

The paper presents H-BERT, a new approach that incorporates complete hierarchical graph expansions of medical codes into BERT, improving predictive performance and phenotypic differentiation.

Findings

01

Improved prediction of over 500 diagnosis classes.

02

Enhanced patient representation in clinical phenotypes.

03

Significant gains in AUC and APS metrics.

Abstract

Electronic health records (EHR) are widely believed to hold a profusion of actionable insights, encrypted in an irregular, semi-structured format, amidst a loud noise background. To simplify learning patterns of health and disease, medical codes in EHR can be decomposed into semantic units connected by hierarchical graphs. Building on earlier synergy between Bidirectional Encoder Representations from Transformers (BERT) and Graph Attention Networks (GAT), we present H-BERT, which ingests complete graph tree expansions of hierarchical medical codes as opposed to only ingesting the leaves and pushes patient-level labels down to each visit. This methodology significantly improves prediction of patient membership in over 500 medical diagnosis classes as measured by aggregated AUC and APS, and creates distinct representations of patients in closely related but clinically distinct phenotypes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Biomedical Text Mining and Ontologies