Online Missing Value Imputation and Change Point Detection with the Gaussian Copula
Yuxuan Zhao, Eric Landgrebe, Eliot Shekhtman, Madeleine Udell

TL;DR
This paper introduces an online Gaussian copula-based imputation method for mixed data types that adapts to changing distributions and efficiently detects change points in multivariate dependence structures, outperforming offline methods.
Contribution
The paper presents a novel online imputation algorithm using Gaussian copulas that handles mixed data, adapts to distribution changes, and detects change points in dependence structures.
Findings
Imputation accuracy improves over offline methods in streaming data.
Method scales efficiently to large datasets, up to an order of magnitude faster.
Effectively detects change points in multivariate dependence with missing data.
Abstract
Missing value imputation is crucial for real-world data science workflows. Imputation is harder in the online setting, as it requires the imputation method itself to be able to evolve over time. For practical applications, imputation algorithms should produce imputations that match the true data distribution, handle data of mixed types, including ordinal, boolean, and continuous variables, and scale to large datasets. In this work we develop a new online imputation algorithm for mixed data using the Gaussian copula. The online Gaussian copula model meets all the desiderata: its imputations match the data distribution even for mixed data, improve over its offline counterpart on the accuracy when the streaming data has a changing distribution, and on the speed (up to an order of magnitude) especially on large scale datasets. By fitting the copula model to online data, we also provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Mental Health Research Topics · Health, Environment, Cognitive Aging
