Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types
Benjamin Christoffersen, Mark Clements, Keith Humphreys, Hedvig, Kjellstr\"om

TL;DR
This paper introduces a highly accurate and efficient Gaussian copula model for imputing missing data of mixed types, extending previous models to support unordered multinomial variables and reducing approximation errors.
Contribution
It presents a novel, precise approximation method using quasi-Monte Carlo procedures and extends Gaussian copula models to include unordered multinomial variables.
Findings
Lower errors in parameter estimation and imputation compared to previous methods
Supports unordered multinomial variables in Gaussian copula models
Achieves faster and more accurate imputation for mixed data types
Abstract
Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Mathematical Approximation and Integration · Markov Chains and Monte Carlo Methods
