Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient   Transformer Models

Yongxin Guo; Zhenglin Cheng; Xiaoying Tang; Zhaopeng Tu; Tao Lin

arXiv:2405.14297·cs.LG·March 11, 2025

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Yongxin Guo, Zhenglin Cheng, Xiaoying Tang, Zhaopeng Tu, Tao Lin

PDF

Open Access 1 Repo 3 Models

TL;DR

The paper introduces DynMoE, an auto-tuning method for Transformer models that dynamically adjusts expert activation, improving efficiency and performance across vision, language, and multimodal tasks.

Contribution

It proposes a novel gating and adaptive expert adjustment mechanism, enabling automatic expert selection and training efficiency in Mixture of Experts models.

Findings

01

Achieves competitive performance with fewer activated experts.

02

Reduces computational overhead during training.

03

Demonstrates effectiveness across diverse tasks.

Abstract

The Sparse Mixture of Experts (SMoE) has been widely employed to enhance the efficiency of training and inference for Transformer-based foundational models, yielding promising results.However, the performance of SMoE heavily depends on the choice of hyper-parameters, such as the number of experts and the number of experts to be activated (referred to as top-k), resulting in significant computational overhead due to the extensive model training by searching over various hyper-parameter configurations. As a remedy, we introduce the Dynamic Mixture of Experts (DynMoE) technique. DynMoE incorporates (1) a novel gating method that enables each token to automatically determine the number of experts to activate. (2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lins-lab/dynmoe
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTarget Tracking and Data Fusion in Sensor Networks