MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan

TL;DR
MoE++ introduces a heterogeneous Mixture-of-Experts framework with zero-computation experts, enhancing efficiency and performance by dynamically adjusting expert engagement and reducing communication overhead.
Contribution
The paper proposes MoE++, a novel framework integrating zero-computation experts, enabling dynamic expert engagement and improved efficiency over traditional MoE models.
Findings
Achieves 1.1-2.1x higher expert throughput compared to vanilla MoE.
Demonstrates improved performance with fewer computational resources.
Enables deployment of zero-computation experts on each GPU, reducing communication overhead.
Abstract
In this work, we aim to simultaneously enhance the effectiveness and efficiency of Mixture-of-Experts (MoE) methods. To achieve this, we propose MoE++, a general and heterogeneous MoE framework that integrates both Feed-Forward Network~(FFN) and zero-computation experts. Specifically, we introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. This design offers three key advantages: (i) Low Computing Overhead: Unlike the uniform mixing mechanism for all tokens within vanilla MoE, MoE++ allows each token to engage with a dynamic number of FFNs, be adjusted by constant vectors, or even skip the MoE layer entirely. (ii) High Performance: By enabling simple tokens to utilize fewer FFN experts, MoE++ allows more experts to focus on challenging tokens, thereby unlocking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Water Quality Monitoring Technologies · Machine Learning and Algorithms
MethodsFocus · Mixture of Experts
