Missing Value Imputation for Mixed Data via Gaussian Copula

Yuxuan Zhao; Madeleine Udell

arXiv:1910.12845·stat.ME·June 17, 2020·KDD·6 cites

Missing Value Imputation for Mixed Data via Gaussian Copula

Yuxuan Zhao, Madeleine Udell

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel semiparametric Gaussian copula-based algorithm for imputing missing values in mixed data sets, effectively handling various data types without tuning parameters.

Contribution

It presents a new, tuning-free imputation method using Gaussian copulas that models mixed data with arbitrary marginals and handles ordinal and Boolean variables.

Findings

01

Outperforms existing imputation methods on synthetic datasets.

02

Effective in modeling complex associations among mixed data types.

03

No tuning parameters required for the algorithm.

Abstract

Missing data imputation forms the first critical step of many data analysis pipelines. The challenge is greatest for mixed data sets, including real, Boolean, and ordinal data, where standard techniques for imputation fail basic sanity checks: for example, the imputed values may not follow the same distributions as the data. This paper proposes a new semiparametric algorithm to impute missing values, with no tuning parameters. The algorithm models mixed data as a Gaussian copula. This model can fit arbitrary marginals for continuous variables and can handle ordinal variables with many levels, including Boolean variables as a special case. We develop an efficient approximate EM algorithm to estimate copula parameters from incomplete mixed data. The resulting model reveals the statistical associations among variables. Experimental results on several synthetic and real datasets show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuxuanzhao2295/Missing-Value-Imputation-for-Mixed-Data-via-Gaussian-Copula
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Tensor decomposition and applications · Stochastic Gradient Optimization Techniques