Efficient EM Training of Gaussian Mixtures with Missing Data

Olivier Delalleau; Aaron Courville; Yoshua Bengio

arXiv:1209.0521·cs.LG·January 9, 2018·20 cites

Efficient EM Training of Gaussian Mixtures with Missing Data

Olivier Delalleau, Aaron Courville, Yoshua Bengio

PDF

Open Access 1 Repo

TL;DR

This paper proposes a fast EM algorithm for training Gaussian mixture models with missing data, leveraging a spanning-tree approach to improve computational efficiency and enhance data imputation for discriminant tasks.

Contribution

The paper introduces a spanning-tree based algorithm that accelerates EM training of Gaussian mixtures with missing data and demonstrates effective data imputation for discriminant learning.

Findings

01

Spanning-tree algorithm significantly speeds up training.

02

Generative model effectively imputes missing data.

03

Improved performance in discriminant tasks with imputed data.

Abstract

In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms. A solution that we explore in this paper is the use of a generative model (a mixture of Gaussians) to compute the conditional expectation of the missing variables given the observed variables. Since training a Gaussian mixture with many different patterns of missing values can be computationally very expensive, we introduce a spanning-tree based algorithm that significantly speeds up training in these conditions. We also observe that good results can be obtained by using the generative model to fill-in the missing values for a separate discriminant learning algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xmartin46/missing-value-treatment-algorithms
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Algorithms and Data Compression · Data Mining Algorithms and Applications