Guided parallelized stochastic gradient descent for delay compensation

Anuraganand Sharma

arXiv:2101.07259·cs.LG·February 13, 2024

Guided parallelized stochastic gradient descent for delay compensation

Anuraganand Sharma

PDF

TL;DR

This paper introduces a guided parallelized stochastic gradient descent algorithm that effectively compensates for delays in asynchronous and synchronous SGD, improving neural network training efficiency and classification accuracy.

Contribution

The paper proposes a novel guided SGD method that reduces delay-induced variance in parallel SGD, enhancing convergence and accuracy in deep learning models.

Findings

01

Guided SGD mitigates delay effects in parallel training.

02

The approach achieves accuracy close to sequential SGD.

03

Experimental results show improved classification performance.

Abstract

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error function. This has led to the development of parallel SGD algorithms, such as asynchronous SGD (ASGD) and synchronous SGD (SSGD) to train deep neural networks. However, it introduces a high variance due to the delay in parameter (weight) update. We address this delay in our proposed algorithm and try to minimize its impact. We employed guided SGD (gSGD) that encourages consistent examples to steer the convergence by compensating the unpredictable deviation caused by the delay. Its convergence rate is also similar to A/SSGD, however, some additional (parallel) processing is required to compensate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent