OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed   Training

Yemao Xu; Dezun Dong; Weixia Xu; Xiangke Liao

arXiv:2005.06728·cs.LG·May 15, 2020·1 cites

OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training

Yemao Xu, Dezun Dong, Weixia Xu, Xiangke Liao

PDF

Open Access 1 Repo

TL;DR

OD-SGD is a novel distributed training algorithm that combines the advantages of Synchronous and Asynchronous SGD, achieving faster training speeds without sacrificing convergence quality.

Contribution

This paper introduces OD-SGD, the first method to integrate SSGD and ASGD features, enhancing distributed training efficiency and convergence performance.

Findings

01

OD-SGD achieves similar or better accuracy than SSGD.

02

OD-SGD trains faster than SSGD and surpasses ASGD in speed.

03

Experimental results on MNIST, CIFAR-10, and ImageNet validate effectiveness.

Abstract

The training of modern deep learning neural network calls for large amounts of computation, which is often provided by GPUs or other specific accelerators. To scale out to achieve faster training speed, two update algorithms are mainly applied in the distributed training process, i.e. the Synchronous SGD algorithm (SSGD) and Asynchronous SGD algorithm (ASGD). SSGD obtains good convergence point while the training speed is slowed down by the synchronous barrier. ASGD has faster training speed but the convergence point is lower when compared to SSGD. To sufficiently utilize the advantages of SSGD and ASGD, we propose a novel technology named One-step Delay SGD (OD-SGD) to combine their strengths in the training process. Therefore, we can achieve similar convergence point and training speed as SSGD and ASGD separately. To the best of our knowledge, we make the first attempt to combine the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CynthiaProtector/OD-SGD
mxnetOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent