Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks
Long Shi, Bingyan Ou, Kang Wei, Weihao Zhu, Zhe Wang, and Zhiyong Chen

TL;DR
This paper introduces Stable-MoE, a Lyapunov-based token routing framework for distributed mixture-of-experts training on edge networks, improving efficiency, stability, and performance amid resource heterogeneity and stochastic workloads.
Contribution
It presents a novel Lyapunov optimization approach for real-time token routing and resource allocation in edge MoE systems, ensuring stability and enhanced throughput.
Findings
Achieves at least 40% higher system throughput.
Improves test accuracy by at least 5%.
Demonstrates stability and efficiency in resource-heterogeneous environments.
Abstract
The sparse activation mechanism of mixture of experts (MoE) model empowers edge intelligence with enhanced training efficiency and reduced computational resource consumption. However, traditional token routing in distributed MoE training faces significant challenges in resource-constrained edge networks characterized by heterogeneous computing capabilities and stochastic token arrivals, which inevitably suffer from workload backlog, resource inefficiency, and performance degradation. To address this issue, we propose a novel Lyapunov-based token routing framework for distributed MoE training over resource-heterogeneous edge networks, termed Stable-MoE. Specifically, we formulate a stochastic optimization problem to maximize both system throughput and gating consistency via optimizing the token routing strategy and computational resource allocation, while ensuring long-term stability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Age of Information Optimization · Privacy-Preserving Technologies in Data
