SSD-SSD: Communication sparsification for distributed deep learning   training

Yemao Xu; Dezun Dong; Yawei Zhao; Weixia Xu; Xiangke Liao

arXiv:2012.05396·cs.DC·April 12, 2021

SSD-SSD: Communication sparsification for distributed deep learning training

Yemao Xu, Dezun Dong, Yawei Zhao, Weixia Xu, Xiangke Liao

PDF

Open Access

TL;DR

This paper introduces SSD-SGD, a communication sparsification method for distributed deep learning that combines synchronous and asynchronous updates to improve training speed while maintaining accuracy.

Contribution

The paper proposes SSD-SGD, a novel hybrid synchronization algorithm with global gradient for local update (GLU), balancing communication efficiency and convergence accuracy in distributed training.

Findings

01

SSD-SGD accelerates training speed by up to 110%.

02

It maintains good convergence accuracy across datasets.

03

The method effectively balances synchronization quality and communication sparsification.

Abstract

Intensive communication and synchronization cost for gradients and parameters is the well-known bottleneck of distributed deep learning training. Based on the observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous SGD (ASGD) delivers a faster raw training speed, we propose Several Steps Delay SGD (SSD-SGD) to combine their merits, aiming at tackling the communication bottleneck via communication sparsification. SSD-SGD explores both global synchronous updates in the parameter servers and asynchronous local updates in the workers in each periodic iteration. The periodic and flexible synchronization makes SSD-SGD achieve good convergence accuracy and fast training speed. To the best of our knowledge, we strike the new balance between synchronization quality and communication sparsification, and improve the trade-off between accuracy and training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Stochastic Gradient Optimization Techniques