MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation   Experts

Peng Jin; Bo Zhu; Li Yuan; Shuicheng Yan

arXiv:2410.07348·cs.LG·October 11, 2024

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan

PDF

Open Access 3 Repos 2 Models 1 Video

TL;DR

MoE++ introduces a heterogeneous Mixture-of-Experts framework with zero-computation experts, enhancing efficiency and performance by dynamically adjusting expert engagement and reducing communication overhead.

Contribution

The paper proposes MoE++, a novel framework integrating zero-computation experts, enabling dynamic expert engagement and improved efficiency over traditional MoE models.

Findings

01

Achieves 1.1-2.1x higher expert throughput compared to vanilla MoE.

02

Demonstrates improved performance with fewer computational resources.

03

Enables deployment of zero-computation experts on each GPU, reducing communication overhead.

Abstract

In this work, we aim to simultaneously enhance the effectiveness and efficiency of Mixture-of-Experts (MoE) methods. To achieve this, we propose MoE++, a general and heterogeneous MoE framework that integrates both Feed-Forward Network~(FFN) and zero-computation experts. Specifically, we introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. This design offers three key advantages: (i) Low Computing Overhead: Unlike the uniform mixing mechanism for all tokens within vanilla MoE, MoE++ allows each token to engage with a dynamic number of FFNs, be adjusted by constant vectors, or even skip the MoE layer entirely. (ii) High Performance: By enabling simple tokens to utilize fewer FFN experts, MoE++ allows more experts to focus on challenging tokens, thereby unlocking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts· slideslive

Taxonomy

TopicsDistributed Sensor Networks and Detection Algorithms · Water Quality Monitoring Technologies · Machine Learning and Algorithms

MethodsFocus · Mixture of Experts