EMFlow: Data Imputation in Latent Space via EM and Deep Flow Models
Qi Ma, Sujit K. Ghosh

TL;DR
EMFlow is a fast, iterative algorithm that performs data imputation in a latent space using an online EM approach combined with normalizing flows, improving accuracy and convergence speed for high-dimensional data.
Contribution
It introduces EMFlow, a novel method combining online EM and deep flow models for efficient data imputation in high-dimensional spaces.
Findings
Outperforms recent methods in accuracy
Faster convergence in high-dimensional datasets
Effective for multivariate and image data
Abstract
The presence of missing values within high-dimensional data is an ubiquitous problem for many applied sciences. A serious limitation of many available data mining and machine learning methods is their inability to handle partially missing values and so an integrated approach that combines imputation and model estimation is vital for down-stream analysis. A computationally fast algorithm, called EMFlow, is introduced that performs imputation in a latent space via an online version of Expectation-Maximization (EM) algorithm by using a normalizing flow (NF) model which maps the data space to a latent space. The proposed EMFlow algorithm is iterative, involving updating the parameters of online EM and NF alternatively. Extensive experimental results for high-dimensional multivariate and image datasets are presented to illustrate the superior performance of the EMFlow compared to a couple of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Adversarial Robustness in Machine Learning
