A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui,, Naigang Wang, Pin-Yu Chen, Christopher Carothers

TL;DR
This paper introduces a provably effective method for pruning experts in fine-tuned sparse Mixture-of-Experts models, reducing model size and computation while preserving accuracy, supported by theoretical guarantees and empirical validation.
Contribution
It presents the first provably efficient expert pruning technique for fine-tuned MoE models, with theoretical guarantees and empirical validation on large vision models.
Findings
Pruning experts with small router l2 norm changes preserves accuracy.
Significant reduction in model size and computation achieved.
Method validated on large vision MoE models and benchmark datasets.
Abstract
The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i.e., experts, through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory or computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in MoE architecture is largely unexplored. To the best of our knowledge, this paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy, while significantly reducing the model size and the computational requirements. Although our theoretical analysis is centered on binary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndoor and Outdoor Localization Technologies · Distributed Sensor Networks and Detection Algorithms · Flow Measurement and Analysis
MethodsPruning · Mixture of Experts
