Learning Factored Representations in a Deep Mixture of Experts

David Eigen; Marc'Aurelio Ranzato; Ilya Sutskever

arXiv:1312.4314·cs.LG·March 11, 2014·ICLR·137 cites

Learning Factored Representations in a Deep Mixture of Experts

David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever

PDF

Open Access

TL;DR

This paper introduces a deep, stacked mixture of experts model that increases the effective number of experts exponentially while maintaining a modest size, demonstrating specialized experts for location and class in image and speech tasks.

Contribution

The work extends Mixture of Experts to a deep, multi-layer model, enabling exponential growth in expert combinations with efficient computation and training.

Findings

01

Learned location-dependent experts for images

02

Developed class-specific experts at deeper layers

03

Effectively used all expert combinations in speech data

Abstract

Mixtures of Experts combine the outputs of several "expert" networks, each of which specializes in a different part of the input space. This is achieved by training a "gating" network that maps each input to a distribution over the experts. Such models show promise for building larger networks that are still cheap to compute at test time, and more parallelizable at training time. In this this work, we extend the Mixture of Experts to a stacked model, the Deep Mixture of Experts, with multiple sets of gating and experts. This exponentially increases the number of effective experts by associating each input with a combination of experts at each layer, yet maintains a modest model size. On a randomly translated version of the MNIST dataset, we find that the Deep Mixture of Experts automatically learns to develop location-dependent ("where") experts at the first layer, and class-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing