Improving Routing in Sparse Mixture of Experts with Graph of Tokens

Tam Nguyen; Ngoc N. Tran; Khai Nguyen; Richard G. Baraniuk

arXiv:2505.00792·cs.LG·May 5, 2025

Improving Routing in Sparse Mixture of Experts with Graph of Tokens

Tam Nguyen, Ngoc N. Tran, Khai Nguyen, Richard G. Baraniuk

PDF

Open Access

TL;DR

This paper introduces a novel approach to improve routing stability in Sparse Mixture of Experts models by leveraging token similarities and attention mechanisms, resulting in more robust and accurate models.

Contribution

The paper proposes the Similarity-Aware and Attention-Aware (S)MoE models that incorporate token interactions to reduce routing fluctuations and improve robustness in SMoE architectures.

Findings

01

Significant reduction in routing fluctuations.

02

Enhanced model accuracy across tasks.

03

Increased robustness compared to baseline MoE-Transformer.

Abstract

Sparse Mixture of Experts (SMoE) has emerged as a key to achieving unprecedented scalability in deep learning. By activating only a small subset of parameters per sample, SMoE achieves an exponential increase in parameter counts while maintaining a constant computational overhead. However, SMoE models are susceptible to routing fluctuations--changes in the routing of a given input to its target expert--at the late stage of model training, leading to model non-robustness. In this work, we unveil the limitation of SMoE through the perspective of the probabilistic graphical model (PGM). Through this PGM framework, we highlight the independence in the expert-selection of tokens, which exposes the model to routing fluctuation and non-robustness. Alleviating this independence, we propose the novel Similarity-Aware (S)MoE, which considers interactions between tokens during expert selection. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Expert finding and Q&A systems