Hierarchical Probabilistic Principal Component Analysis of Longitudinal Data
Xinyu Zhang, Ameer Qaqish, D.Y. Lin, Didong Li

TL;DR
This paper introduces HPPCA, a hierarchical probabilistic PCA model designed for longitudinal data with missing values, capturing nested variance and temporal dynamics to improve data imputation and outcome prediction.
Contribution
HPPCA explicitly models between-subject and within-subject variation using Gaussian processes, advancing analysis of incomplete longitudinal datasets.
Findings
HPPCA outperforms standard PPCA and multivariate functional PCA in imputation accuracy.
HPPCA effectively captures hierarchical structure in longitudinal data.
Application to COVID-19 symptoms improved clinical outcome prediction.
Abstract
In many longitudinal studies, a large number of variables are measured repeatedly over time, with substantial missing data. Existing methods, such as probabilistic principal component analysis (PPCA), are ill-equipped to handle such incomplete, high-dimensional longitudinal data, as they fail to account for the nested sources of variation and temporal dependency inherent in repeated measures. We introduce hierarchical probabilistic principal component analysis (HPPCA), a two-level probabilistic factor model that explicitly separates between-subject variance from time-varying within-subject dynamics. The within-subject latent factors are modeled by a Gaussian process. We develop an EM algorithm to handle missing data and flexible covariance kernels, accelerated by computationally efficient initializers. Simulation studies demonstrated that HPPCA robustly recovers model parameters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
