Unsupervised Probabilistic Models for Sequential Electronic Health Records
Alan D. Kaplan, John D. Greene, Vincent X. Liu, Priyadip Ray

TL;DR
This paper introduces an unsupervised probabilistic model for analyzing heterogeneous and sequential electronic health records, enabling subgroup identification and dynamic analysis of complex medical data.
Contribution
It presents a novel layered mixture model that captures underlying structure and dynamics in heterogeneous EHR data for the first time.
Findings
Model reveals meaningful patient subgroups.
Enables analysis of sequences related to mortality risk.
Provides new insights into complex EHR data.
Abstract
We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference
