Efficient EM Training of Gaussian Mixtures with Missing Data
Olivier Delalleau, Aaron Courville, Yoshua Bengio

TL;DR
This paper proposes a fast EM algorithm for training Gaussian mixture models with missing data, leveraging a spanning-tree approach to improve computational efficiency and enhance data imputation for discriminant tasks.
Contribution
The paper introduces a spanning-tree based algorithm that accelerates EM training of Gaussian mixtures with missing data and demonstrates effective data imputation for discriminant learning.
Findings
Spanning-tree algorithm significantly speeds up training.
Generative model effectively imputes missing data.
Improved performance in discriminant tasks with imputed data.
Abstract
In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms. A solution that we explore in this paper is the use of a generative model (a mixture of Gaussians) to compute the conditional expectation of the missing variables given the observed variables. Since training a Gaussian mixture with many different patterns of missing values can be computationally very expensive, we introduce a spanning-tree based algorithm that significantly speeds up training in these conditions. We also observe that good results can be obtained by using the generative model to fill-in the missing values for a separate discriminant learning algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Algorithms and Data Compression · Data Mining Algorithms and Applications
