SSD-SSD: Communication sparsification for distributed deep learning training
Yemao Xu, Dezun Dong, Yawei Zhao, Weixia Xu, Xiangke Liao

TL;DR
This paper introduces SSD-SGD, a communication sparsification method for distributed deep learning that combines synchronous and asynchronous updates to improve training speed while maintaining accuracy.
Contribution
The paper proposes SSD-SGD, a novel hybrid synchronization algorithm with global gradient for local update (GLU), balancing communication efficiency and convergence accuracy in distributed training.
Findings
SSD-SGD accelerates training speed by up to 110%.
It maintains good convergence accuracy across datasets.
The method effectively balances synchronization quality and communication sparsification.
Abstract
Intensive communication and synchronization cost for gradients and parameters is the well-known bottleneck of distributed deep learning training. Based on the observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous SGD (ASGD) delivers a faster raw training speed, we propose Several Steps Delay SGD (SSD-SGD) to combine their merits, aiming at tackling the communication bottleneck via communication sparsification. SSD-SGD explores both global synchronous updates in the parameter servers and asynchronous local updates in the workers in each periodic iteration. The periodic and flexible synchronization makes SSD-SGD achieve good convergence accuracy and fast training speed. To the best of our knowledge, we strike the new balance between synchronization quality and communication sparsification, and improve the trade-off between accuracy and training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Stochastic Gradient Optimization Techniques
