Deep Mixtures of Factor Analysers
Yichuan Tang (University of Toronto), Ruslan Salakhutdinov (University, of Toronto), Geoffrey Hinton (University of Toronto)

TL;DR
This paper introduces a greedy layer-wise learning algorithm for Deep Mixtures of Factor Analysers (DMFAs), which are more efficient and less prone to overfitting than shallow MFAs and RBMs, leading to better density models.
Contribution
The paper presents a novel greedy layer-wise training method for DMFAs, improving efficiency and model quality over existing shallow models and RBMs.
Findings
DMFAs outperform MFAs and RBMs on various datasets.
Learning and inference are more efficient in DMFAs.
Sharing lower-level factors prevents overfitting.
Abstract
An efficient way to learn deep density models that have many layers of latent variables is to learn one layer at a time using a model that has only one layer of latent variables. After learning each layer, samples from the posterior distributions for that layer are used as training data for learning the next layer. This approach is commonly used with Restricted Boltzmann Machines, which are undirected graphical models with a single hidden layer, but it can also be used with Mixtures of Factor Analysers (MFAs) which are directed graphical models. In this paper, we present a greedy layer-wise learning algorithm for Deep Mixtures of Factor Analysers (DMFAs). Even though a DMFA can be converted to an equivalent shallow MFA by multiplying together the factor loading matrices at different levels, learning and inference are much more efficient in a DMFA and the sharing of each lower-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis
