CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

Nam V. Nguyen; Huy Nguyen; Quang Pham; Van Nguyen; Savitha Ramasamy; Nhat Ho

arXiv:2505.13380·cs.AI·May 20, 2025

CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

Nam V. Nguyen, Huy Nguyen, Quang Pham, Van Nguyen, Savitha Ramasamy, Nhat Ho

PDF

Open Access 2 Repos

TL;DR

This paper introduces CompeteSMoE, a novel training method for sparse mixture of experts that uses a competition-based routing mechanism, improving efficiency, robustness, and scalability in large language and vision models.

Contribution

It proposes a new competition-based routing mechanism for SMoE, with theoretical guarantees and an effective training algorithm for large models.

Findings

01

Better sample efficiency than softmax routing

02

Strong performance on language and vision tasks

03

Robustness and scalability demonstrated

Abstract

Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, we argue that effective SMoE training remains challenging because of the suboptimal routing process where experts that perform computation do not directly contribute to the routing process. In this work, we propose competition, a novel mechanism to route tokens to experts with the highest neural response. Theoretically, we show that the competition mechanism enjoys a better sample efficiency than the traditional softmax routing. Furthermore, we develop CompeteSMoE, a simple yet effective algorithm to train large language models by deploying a router to learn the competition policy, thus enjoying strong performances at a low training overhead. Our extensive empirical evaluations on both the visual instruction tuning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing · Multimodal Machine Learning Applications

MethodsSoftmax