Variance Reduced Local SGD with Lower Communication Complexity

Xianfeng Liang; Shuheng Shen; Jingchang Liu; Zhen Pan; Enhong Chen,; Yifei Cheng

arXiv:1912.12844·cs.LG·January 1, 2020·90 cites

Variance Reduced Local SGD with Lower Communication Complexity

Xianfeng Liang, Shuheng Shen, Jingchang Liu, Zhen Pan, Enhong Chen,, Yifei Cheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces VRL-SGD, a novel distributed optimization method that significantly reduces communication costs while maintaining linear speedup, especially effective with non-identical data distributions across workers.

Contribution

The paper proposes VRL-SGD, which lowers communication complexity in distributed SGD with non-i.i.d. data, outperforming existing Local SGD methods.

Findings

01

VRL-SGD achieves a lower communication complexity of O(T^{1/2} N^{3/2}).

02

VRL-SGD maintains linear iteration speedup with non-identical datasets.

03

Experimental results show VRL-SGD outperforms Local SGD with diverse data.

Abstract

To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires $O (T^{\frac{3}{4}} N^{\frac{3}{4}})$ communications to maintain its \emph{linear iteration speedup} property, where $T$ is the total number of iterations and $N$ is the number of workers. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to further reduce the communication complexity. Benefiting from eliminating the dependency on the gradient variance among workers, we theoretically prove that VRL-SGD achieves a \emph{linear iteration speedup} with a lower communication complexity $O(T^{\frac{1}{2}}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zerolxf/VRL-SGD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Stochastic Gradient Optimization Techniques · Face and Expression Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Local SGD · Stochastic Gradient Descent