CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Quang Pham, Giang Do, Huy Nguyen, TrungTin Nguyen, Chenghao Liu, Mina, Sartipi, Binh T. Nguyen, Savitha Ramasamy, Xiaoli Li, Steven Hoi, Nhat Ho

TL;DR
This paper introduces CompeteSMoE, a novel training method for sparse mixture of experts that uses a competition-based routing mechanism to improve model scalability, efficiency, and performance across various tasks.
Contribution
The paper proposes a competition mechanism for SMoE training that addresses representation collapse and introduces CompeteSMoE, a simple, efficient algorithm with strong empirical results.
Findings
CompeteSMoE achieves superior performance over existing SMoE methods.
The competition routing policy enhances model robustness and scalability.
The proposed method maintains low computational overheads.
Abstract
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator. We further propose CompeteSMoE, an effective and efficient algorithm to train large language models by deploying a simple router that predicts the competition outcomes. Consequently, CompeteSMoE enjoys strong performance gains from the competition routing policy while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms
