Mixed and missing data: a unified treatment with latent graphical models
Xiao Li, Jinzhu Jia, Yuan Yao

TL;DR
This paper introduces a unified latent Gaussian graphical model for handling mixed and missing data, enabling improved data analysis, imputation, and prediction across various applications.
Contribution
It develops a novel latent Gaussian model with a sparse inverse covariance estimation for mixed and missing data, outperforming existing methods in prediction and imputation.
Findings
Outperforms state-of-the-art methods on medical datasets
Better than random forest in prediction error when model is correct
More effective than hot deck imputation even if model is misspecified
Abstract
We propose to learn latent graphical models when data have mixed variables and missing values. This model could be used for further data analysis, including regression, classification, ranking etc. It also could be used for imputing missing values. We specify a latent Gaussian model for the data, where the categorical variables are generated by discretizing an unobserved variable and the latent variables are multivariate Gaussian. The observed data consists of two parts: observed Gaussian variables and observed categorical variables, where the latter part is considered as partially missing Gaussian variables. We use the Expectation-Maximization algorithm to fit the model. To prevent overfitting we use sparse inverse covariance estimation to obtain sparse estimate of the latent covariance matrix, equivalently, the graphical model. The fitted model then could be used for problems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Modeling and Causal Inference · Statistical Methods and Bayesian Inference
