Enhancing Phenotype Discovery in Electronic Health Records through Prior Knowledge-Guided Unsupervised Learning
Melanie Mayer, Kimberly Lactaoen, Gary E. Weissman, Blanca E. Himes, Rebecca A. Hubbard

TL;DR
This paper introduces a Bayesian latent class framework that incorporates clinical knowledge into unsupervised learning of EHR data, enhancing interpretability and identifying meaningful disease sub-phenotypes, exemplified by a T2 inflammation-related asthma subgroup.
Contribution
It presents a novel method for integrating domain-specific knowledge into Bayesian clustering of EHR data, improving phenotype discovery and interpretability.
Findings
Identified a distinct T2 inflammation-related asthma sub-phenotype.
Demonstrated the model's ability to handle missing data and uncertainty.
Revealed a bimodal distribution indicating clear class separation.
Abstract
Objectives: Unsupervised learning with electronic health record (EHR) data has shown promise for phenotype discovery, but approaches typically disregard existing clinical information, limiting interpretability. We operationalize a Bayesian latent class framework for phenotyping that incorporates domain-specific knowledge to improve clinical meaningfulness of EHR-derived phenotypes and illustrate its utility by identifying an asthma sub-phenotype informed by features of Type 2 (T2) inflammation. Materials and methods: We illustrate a framework for incorporating clinical knowledge into a Bayesian latent class model via informative priors to guide unsupervised clustering toward clinically relevant subgroups. This approach models missingness, accounting for potential missing-not-at-random patterns, and provides patient-level probabilities for phenotype assignment with uncertainty. Using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Asthma and respiratory diseases
