Regression and Dimension Reduction for Multivariate Mixed-Type Data via Semiparametric Gaussian Copula
Debangan Dey, Vadim Zipunnikov

TL;DR
This paper introduces a semiparametric Gaussian copula framework for analyzing multivariate mixed-type data, enabling regression, PCA, and imputation with improved computational efficiency and theoretical guarantees.
Contribution
The paper develops novel bridging results, efficient algorithms, and methods for latent space regression, PCA, and missing data imputation for mixed-type data.
Findings
Effective modeling of mixed-type variables using latent Gaussian copula.
Significant reduction in computational complexity from O(n^4) to O(n log n).
Successful application to NHANES data linking frailty measures to mortality.
Abstract
Clinical and epidemiological studies encode participant information in multivariate vectors with mixed type variables on continuous, truncated, ordinal, and binary scales. Semiparametric Gaussian Copula (SGC) assumes that observed data is generated by latent multivariate normal random variables which marginals are monotonically transformed and then truncated/ordinalized/binarized. In SGC, the latent correlation matrix fully determines the dependence structure and it is estimated through an inversion of ``bridges'' between Kendall's Tau rank correlations of observed variables and latent correlations. By employing SGC, we develop regression (SGC-Reg), principal component analysis (SGC-PCA), and principal component regression (SGC-PCR) for latent representations of observed data. To build our framework, we make several key contributions: i) establishing novel bridging results for general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Mental Health Research Topics
