Deep Mixture of Experts via Shallow Embedding
Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia, Mirhoseini, Trevor Darrell, Joseph E. Gonzalez

TL;DR
DeepMoE introduces a dynamic routing architecture that adaptively sparsifies and recalibrates channels in convolutional networks, achieving higher accuracy with less computation on benchmark datasets.
Contribution
This paper proposes a novel DeepMoE architecture that uses a multi-headed sparse gating network for dynamic channel selection, enhancing representational power over standard CNNs.
Findings
Achieves higher accuracy than standard CNNs on benchmark datasets.
Reduces computational cost while maintaining or improving performance.
Demonstrates effectiveness of dynamic sparsification in deep networks.
Abstract
Larger networks generally have greater representational power at the cost of increased computational complexity. Sparsifying such networks has been an active area of research but has been generally limited to static regularization or dynamic approaches using reinforcement learning. We explore a mixture of experts (MoE) approach to deep dynamic routing, which activates certain experts in the network on a per-example basis. Our novel DeepMoE architecture increases the representational power of standard convolutional networks by adaptively sparsifying and recalibrating channel-wise features in each convolutional layer. We employ a multi-headed sparse gating network to determine the selection and scaling of channels for each input, leveraging exponential combinations of experts within a single convolutional network. Our proposed architecture is evaluated on four benchmark datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques · Anomaly Detection Techniques and Applications
