STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Shuheng Shen; Yifei Cheng; Jingchang Liu; Linli Xu

arXiv:2006.06377·cs.LG·December 16, 2020

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Shuheng Shen, Yifei Cheng, Jingchang Liu, Linli Xu

PDF

Open Access 1 Video

TL;DR

This paper introduces STL-SGD, a stagewise local SGD method that adaptively increases communication periods to accelerate convergence and reduce communication complexity in distributed machine learning.

Contribution

STL-SGD adaptively increases communication periods with decreasing learning rate, achieving faster convergence and lower communication complexity than existing Local SGD methods.

Findings

01

STL-SGD maintains the same convergence rate as mini-batch SGD.

02

It achieves $O(N \, \log T)$ communication complexity for strongly convex objectives.

03

Experiments show STL-SGD outperforms traditional Local SGD on convex and non-convex tasks.

Abstract

Distributed parallel stochastic gradient descent algorithms are workhorses for large scale machine learning tasks. Among them, local stochastic gradient descent (Local SGD) has attracted significant attention due to its low communication complexity. Previous studies prove that the communication complexity of Local SGD with a fixed or an adaptive communication period is in the order of $O (N^{\frac{3}{2}} T^{\frac{1}{2}})$ and $O (N^{\frac{3}{4}} T^{\frac{3}{4}})$ when the data distributions on clients are identical (IID) or otherwise (Non-IID), where $N$ is the number of clients and $T$ is the number of iterations. In this paper, to accelerate the convergence by reducing the communication complexity, we propose \textit{ST}agewise \textit{L}ocal \textit{SGD} (STL-SGD), which increases the communication period gradually along with decreasing learning rate. We prove that STL-SGD can keep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period· underline

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications

MethodsLocal SGD · Stochastic Gradient Descent