EmbRace: Accelerating Sparse Communication for Distributed Training of   NLP Neural Networks

Shengwei Li; Zhiquan Lai; Dongsheng Li; Yiming Zhang; Xiangyu Ye; Yabo; Duan

arXiv:2110.09132·cs.LG·June 28, 2022

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

Shengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang, Xiangyu Ye, Yabo, Duan

PDF

TL;DR

EmbRace is a novel communication framework that significantly accelerates distributed training of sparse NLP models by reducing communication overhead through sparsity-aware hybrid communication and optimized scheduling.

Contribution

It introduces Sparsity-aware Hybrid Communication and 2D Communication Scheduling to improve scalability and efficiency in training sparse NLP models.

Findings

01

Achieves up to 2.41X speedup over baselines.

02

Effectively overlaps sparse communication with computation.

03

Reduces communication overhead for sparse parameters.

Abstract

Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively low scalability for sparse models like natural language processing (NLP) models that have highly sparse embedding tables. Most existing works overlook the sparsity of model parameters thus suffering from significant but unnecessary communication overhead. In this paper, we propose EmbRace, an efficient communication framework to accelerate communications of distributed training for sparse models. EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters. To effectively overlap sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.