Bayesian low-rank latent-cluster regression for mixed health outcomes
Hsin-Hsiung Huang, Suyeon Kang

TL;DR
This paper introduces a Bayesian low-rank latent-cluster regression model for multivariate mixed health outcomes, enabling simultaneous clustering, dimension reduction, and interpretability in high-dimensional, heterogeneous health data.
Contribution
The paper develops a novel Bayesian mixture model with adaptive rank and cluster number tuning, providing theoretical guarantees and practical tools for analyzing complex health outcome data.
Findings
Accurately recovers the number of clusters in various regimes.
Performs competitively against established clustering methods.
Produces interpretable county- and state-level health outcome maps.
Abstract
High-dimensional health and surveillance studies often involve many collinear predictors, multiple correlated outcomes of different types, and latent heterogeneity across observational units. We propose a Bayesian latent-cluster reduced-rank regression model for multivariate mixed outcomes. The model is a finite mixture of regression surfaces: each latent cluster has a cluster-specific mean shift and a low-rank coefficient matrix, yielding simultaneous clustering, dimension reduction, and component-wise interpretability. Response coordinates may be Gaussian, Bernoulli, or negative binomial. Multiplicative gamma process shrinkage adapts the effective rank within each cluster, and a WAIC-based criterion is used to tune the number of clusters and the nominal maximal rank. We establish posterior contraction for the identifiable component-specific regression surfaces and mean shifts, up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
