Guided parallelized stochastic gradient descent for delay compensation
Anuraganand Sharma

TL;DR
This paper introduces a guided parallelized stochastic gradient descent algorithm that effectively compensates for delays in asynchronous and synchronous SGD, improving neural network training efficiency and classification accuracy.
Contribution
The paper proposes a novel guided SGD method that reduces delay-induced variance in parallel SGD, enhancing convergence and accuracy in deep learning models.
Findings
Guided SGD mitigates delay effects in parallel training.
The approach achieves accuracy close to sequential SGD.
Experimental results show improved classification performance.
Abstract
Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error function. This has led to the development of parallel SGD algorithms, such as asynchronous SGD (ASGD) and synchronous SGD (SSGD) to train deep neural networks. However, it introduces a high variance due to the delay in parameter (weight) update. We address this delay in our proposed algorithm and try to minimize its impact. We employed guided SGD (gSGD) that encourages consistent examples to steer the convergence by compensating the unpredictable deviation caused by the delay. Its convergence rate is also similar to A/SSGD, however, some additional (parallel) processing is required to compensate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
