Accelerating Distributed MoE Training and Inference with Lina
Jiamin Li, Yimin Jiang, Yibo Zhu, Cong Wang, Hong Xu

TL;DR
This paper introduces Lina, a system that significantly accelerates distributed MoE training and inference by optimizing all-to-all communication, enabling larger models to be trained and served more efficiently.
Contribution
Lina systematically analyzes all-to-all communication bottlenecks in distributed MoE and proposes a novel scheduling approach to mitigate these issues.
Findings
Lina reduces training step time by up to 1.73x.
Lina decreases 95th percentile inference time by an average of 1.63x.
The system effectively balances transfer size and bandwidth during inference.
Abstract
Scaling model parameters improves model quality at the price of high computation overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) architecture, have sub-linear scaling of computation cost with model size, thus providing opportunities to train and serve a larger model at lower cost than their dense counterparts. However, distributed MoE training and inference is inefficient, mainly due to the interleaved all-to-all communication during model computation. This paper makes two main contributions. First, we systematically analyze all-to-all overhead in distributed MoE and present the main causes for it to be the bottleneck in training and inference, respectively. Second, we design and build Lina to address the all-to-all bottleneck head-on. Lina opportunistically prioritizes all-to-all over the concurrent allreduce whenever feasible using tensor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Traffic Prediction and Management Techniques · Context-Aware Activity Recognition Systems
