DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale   Decentralized Neural Network Training

Alessandro Rigazzi

arXiv:1911.02516·cs.LG·November 7, 2019

DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale Decentralized Neural Network Training

Alessandro Rigazzi

PDF

TL;DR

This paper introduces DC-S3GD, a decentralized stale-synchronous SGD method that overlaps computation and communication, compensates for errors, and achieves state-of-the-art results in large-scale neural network training.

Contribution

It presents a novel decentralized stale-synchronous SGD algorithm with delay compensation and gradient correction, improving training efficiency and accuracy.

Findings

01

Achieved state-of-the-art results on CNN training with large batches.

02

Demonstrated effective overlap of computation and communication.

03

Validated the approach's effectiveness through theoretical analysis and experiments.

Abstract

Data parallelism has become the de facto standard for training Deep Neural Network on multiple processing units. In this work we propose DC-S3GD, a decentralized (without Parameter Server) stale-synchronous version of the Delay-Compensated Asynchronous Stochastic Gradient Descent (DC-ASGD) algorithm. In our approach, we allow for the overlap of computation and communication, and compensate the inherent error with a first-order correction of the gradients. We prove the effectiveness of our approach by training Convolutional Neural Network with large batches and achieving state-of-the-art results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.