STSyn: Speeding Up Local SGD with Straggler-Tolerant Synchronization

Feng Zhu; Jingjing Zhang; Xin Wang

arXiv:2210.03521·cs.LG·May 30, 2023

STSyn: Speeding Up Local SGD with Straggler-Tolerant Synchronization

Feng Zhu, Jingjing Zhang, Xin Wang

PDF

Open Access

TL;DR

STSyn is a novel local SGD method that improves training efficiency by tolerating stragglers, waiting for the fastest workers, and utilizing all effective local updates, with proven convergence and superior experimental performance.

Contribution

The paper introduces STSyn, a straggler-tolerant local SGD strategy that enhances efficiency by asynchronous synchronization and rigorous convergence analysis.

Findings

01

STSyn reduces training time compared to existing methods.

02

It achieves higher communication efficiency through selective synchronization.

03

Experimental results confirm its superiority over state-of-the-art schemes.

Abstract

Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. In this paper, to mitigate stragglers and improve communication efficiency, a novel local SGD strategy, named STSyn, is developed. The key point is to wait for the $K$ fastest workers, while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. An analysis of the average wall-clock time, average number of local updates and average number of uploading workers per round is provided to gauge the performance of STSyn. The convergence of STSyn is also rigorously established even when the objective function is nonconvex. Experimental results show the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterconnection Networks and Systems · Network Time Synchronization Technologies · Advanced MIMO Systems Optimization

MethodsStochastic Gradient Descent · Local SGD