Variance Reduced Local SGD with Lower Communication Complexity
Xianfeng Liang, Shuheng Shen, Jingchang Liu, Zhen Pan, Enhong Chen,, Yifei Cheng

TL;DR
This paper introduces VRL-SGD, a novel distributed optimization method that significantly reduces communication costs while maintaining linear speedup, especially effective with non-identical data distributions across workers.
Contribution
The paper proposes VRL-SGD, which lowers communication complexity in distributed SGD with non-i.i.d. data, outperforming existing Local SGD methods.
Findings
VRL-SGD achieves a lower communication complexity of O(T^{1/2} N^{3/2}).
VRL-SGD maintains linear iteration speedup with non-identical datasets.
Experimental results show VRL-SGD outperforms Local SGD with diverse data.
Abstract
To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires communications to maintain its \emph{linear iteration speedup} property, where is the total number of iterations and is the number of workers. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to further reduce the communication complexity. Benefiting from eliminating the dependency on the gradient variance among workers, we theoretically prove that VRL-SGD achieves a \emph{linear iteration speedup} with a lower communication complexity $O(T^{\frac{1}{2}}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Stochastic Gradient Optimization Techniques · Face and Expression Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Local SGD · Stochastic Gradient Descent
