Modeling Heterogeneity and Missing Data of Multiple Longitudinal Outcomes in Electronic Health Records
Rebecca Anthopolos, Ying Wei, Qixuan Chen

TL;DR
This paper introduces a Bayesian shared parameter model for analyzing multiple longitudinal health outcomes in EHR data, accounting for nonignorable missing data mechanisms and heterogeneity among patient subgroups.
Contribution
It develops a novel joint modeling approach linking growth mixture models with visit and response processes, implemented in the R package EHRMiss.
Findings
Model effectively captures patient heterogeneity.
Handles nonignorable missing data mechanisms.
Demonstrates accurate subgroup inference in simulations.
Abstract
In electronic health records (EHRs), latent subgroups of patients may exhibit distinctive patterning in their longitudinal health trajectories. For such data, growth mixture models (GMMs) enable classifying patients into different latent classes based on individual trajectories and hypothesized risk factors. However, the application of GMMs is hindered by the special missing data problem in EHRs, which manifests two patient-led missing data processes: the visit process and the response process for an EHR variable conditional on a patient visiting the clinic. If either process is associated with the process generating the longitudinal outcomes, then valid inferences require accounting for a nonignorable missing data mechanism. We propose a Bayesian shared parameter model that links GMMs of multiple longitudinal health outcomes, the visit process, and the response process of each outcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Bayesian Methods and Mixture Models
