Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts
Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao,, Chengquan Jiang, Qi Hou, Weihao Cui, Size Zheng, Li-Wen Chang, Quan Chen and, Xin Liu

TL;DR
COMET introduces a fine-grained communication-computation overlapping system for MoE models, significantly reducing communication bottlenecks and accelerating large-scale language model training.
Contribution
It presents a novel fine-grained overlapping approach using data dependency analysis and task rescheduling to improve efficiency in distributed MoE models.
Findings
Achieves 1.96x speedup in a single MoE layer execution.
Delivers 1.71x average speedup for end-to-end training.
Saved millions of GPU hours in production environments.
Abstract
Mixture-of-experts (MoE) has been extensively employed to scale large language models to trillion-plus parameters while maintaining a fixed computational cost. The development of large MoE models in the distributed scenario encounters the problem of large communication overhead. The inter-device communication of a MoE layer can occupy 47% time of the entire model execution with popular models and frameworks. Therefore, existing methods suggest the communication in a MoE layer to be pipelined with the computation for overlapping. However, these coarse grained overlapping schemes introduce a notable impairment of computational efficiency and the latency concealing is sub-optimal. To this end, we present COMET, an optimized MoE system with fine-grained communication-computation overlapping. Leveraging data dependency analysis and task rescheduling, COMET achieves precise fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Mobile Crowdsensing and Crowdsourcing · Big Data and Digital Economy
MethodsMixture of Experts
