Selective Sinkhorn Routing for Improved Sparse Mixture of Experts
Duc Anh Nguyen, Huu Binh Ta, Nhuan Le Duc, Tan M. Nguyen, Toan Tran

TL;DR
This paper introduces Selective Sinkhorn Routing, a novel method for sparse mixture of experts that improves expert balancing and model performance by formulating token assignment as an optimal transport problem, reducing training overhead.
Contribution
The paper proposes SSR, a Sinkhorn-based routing mechanism that replaces auxiliary losses, enabling more effective expert balancing with less complexity and faster training.
Findings
SSR achieves higher accuracy in language and image tasks.
SSR reduces training time compared to previous methods.
SSR enhances robustness to input corruption.
Abstract
Sparse Mixture-of-Experts (SMoE) has gained prominence as a scalable and computationally efficient architecture, enabling significant growth in model capacity without incurring additional inference costs. However, existing SMoE models often rely on auxiliary losses (e.g., z-loss, load balancing) and additional trainable parameters (e.g., noisy gating) to encourage expert diversity, leading to objective misalignment and increased model complexity. Moreover, existing Sinkhorn-based methods suffer from significant training overhead due to their heavy reliance on the computationally expensive Sinkhorn algorithm. In this work, we formulate token-to-expert assignment as an optimal transport problem, incorporating constraints to ensure balanced expert utilization. We demonstrate that introducing a minimal degree of optimal transport-based routing enhances SMoE performance without requiring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
