MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu,, Xiang Bai

TL;DR
MoE Jetpack offers a practical approach to convert dense vision models into efficient Mixture of Experts models through fine-tuning, reducing training costs and improving performance.
Contribution
We introduce MoE Jetpack, a method combining checkpoint recycling and hyperspherical adaptive MoE layers to facilitate effective fine-tuning of dense models into MoE models.
Findings
Significantly faster convergence when fine-tuning with MoE Jetpack.
Improved accuracy of MoE models on vision tasks.
Reduced computational resources needed for MoE training.
Abstract
The sparsely activated mixture of experts (MoE) model presents a promising alternative to traditional densely activated (dense) models, enhancing both quality and computational efficiency. However, training MoE models from scratch demands extensive data and computational resources. Moreover, public repositories like timm mainly provide pre-trained dense checkpoints, lacking similar resources for MoE models, hindering their adoption. To bridge this gap, we introduce MoE Jetpack, an effective method for fine-tuning dense checkpoints into MoE models. MoE Jetpack incorporates two key techniques: (1) checkpoint recycling, which repurposes dense checkpoints as initial weights for MoE models, thereby accelerating convergence, enhancing accuracy, and alleviating the computational burden of pre-training; (2) hyperspherical adaptive MoE (SpheroMoE) layer, which optimizes the MoE architecture for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRobotics and Automated Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Mixture of Experts
