Accelerating Parallel Stochastic Gradient Descent via Non-blocking   Mini-batches

Haoze He; Parijat Dube

arXiv:2211.00889·cs.LG·November 10, 2022·1 cites

Accelerating Parallel Stochastic Gradient Descent via Non-blocking Mini-batches

Haoze He, Parijat Dube

PDF

Open Access

TL;DR

This paper introduces Non-blocking SGD, a decentralized training method that splits batches into mini-batches to reduce straggler delays, accelerating convergence in heterogeneous environments.

Contribution

It proposes a novel Non-blocking SGD approach that improves synchronization efficiency and convergence speed by splitting batches and using gradient accumulation in decentralized settings.

Findings

01

Speeds up training by up to 2x compared to state-of-the-art methods.

02

Effectively mitigates straggler delays in heterogeneous environments.

03

Maintains convergence guarantees with gradient accumulation.

Abstract

SOTA decentralized SGD algorithms can overcome the bandwidth bottleneck at the parameter server by using communication collectives like Ring All-Reduce for synchronization. While the parameter updates in distributed SGD may happen asynchronously there is still a synchronization barrier to make sure that the local training epoch at every learner is complete before the learners can advance to the next epoch. The delays in waiting for the slowest learners(stragglers) remain to be a problem in the synchronization steps of these state-of-the-art decentralized frameworks. In this paper, we propose the (de)centralized Non-blocking SGD (Non-blocking SGD) which can address the straggler problem in a heterogeneous environment. The main idea of Non-blocking SGD is to split the original batch into mini-batches, then accumulate the gradients and update the model based on finished mini-batches. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Memory and Neural Computing · Distributed Control Multi-Agent Systems

MethodsStochastic Gradient Descent