OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training
Yemao Xu, Dezun Dong, Weixia Xu, Xiangke Liao

TL;DR
OD-SGD is a novel distributed training algorithm that combines the advantages of Synchronous and Asynchronous SGD, achieving faster training speeds without sacrificing convergence quality.
Contribution
This paper introduces OD-SGD, the first method to integrate SSGD and ASGD features, enhancing distributed training efficiency and convergence performance.
Findings
OD-SGD achieves similar or better accuracy than SSGD.
OD-SGD trains faster than SSGD and surpasses ASGD in speed.
Experimental results on MNIST, CIFAR-10, and ImageNet validate effectiveness.
Abstract
The training of modern deep learning neural network calls for large amounts of computation, which is often provided by GPUs or other specific accelerators. To scale out to achieve faster training speed, two update algorithms are mainly applied in the distributed training process, i.e. the Synchronous SGD algorithm (SSGD) and Asynchronous SGD algorithm (ASGD). SSGD obtains good convergence point while the training speed is slowed down by the synchronous barrier. ASGD has faster training speed but the convergence point is lower when compared to SSGD. To sufficiently utilize the advantages of SSGD and ASGD, we propose a novel technology named One-step Delay SGD (OD-SGD) to combine their strengths in the training process. Therefore, we can achieve similar convergence point and training speed as SSGD and ASGD separately. To the best of our knowledge, we make the first attempt to combine the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent
