EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Shengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang, Xiangyu Ye, Yabo, Duan

TL;DR
EmbRace is a novel communication framework that significantly accelerates distributed training of sparse NLP models by reducing communication overhead through sparsity-aware hybrid communication and optimized scheduling.
Contribution
It introduces Sparsity-aware Hybrid Communication and 2D Communication Scheduling to improve scalability and efficiency in training sparse NLP models.
Findings
Achieves up to 2.41X speedup over baselines.
Effectively overlaps sparse communication with computation.
Reduces communication overhead for sparse parameters.
Abstract
Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively low scalability for sparse models like natural language processing (NLP) models that have highly sparse embedding tables. Most existing works overlook the sparsity of model parameters thus suffering from significant but unnecessary communication overhead. In this paper, we propose EmbRace, an efficient communication framework to accelerate communications of distributed training for sparse models. EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters. To effectively overlap sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
