Efficient Distributed SGD with Variance Reduction
Soham De, Tom Goldstein

TL;DR
This paper introduces CentralVR, a distributed stochastic gradient descent method with variance reduction that scales efficiently across many workers, achieving faster convergence and better performance on large datasets.
Contribution
The paper presents CentralVR, a novel variance-reduced distributed SGD algorithm that scales linearly with the number of workers and improves convergence rates.
Findings
CentralVR scales linearly with the number of worker nodes.
CentralVR achieves provably linear convergence rates.
CentralVR outperforms existing methods in large-scale experiments.
Abstract
Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets. However, SGD suffers from two main drawbacks: (i) The noisy gradient updates have high variance, which slows down convergence as the iterates approach the optimum, and (ii) SGD scales poorly in distributed settings, typically experiencing rapidly decreasing marginal benefits as the number of workers increases. In this paper, we propose a highly parallel method, CentralVR, that uses error corrections to reduce the variance of SGD gradient updates, and scales linearly with the number of worker nodes. CentralVR enjoys low iteration complexity, provably linear convergence rates, and exhibits linear performance gains up to hundreds of cores for massive datasets. We compare CentralVR to state-of-the-art parallel stochastic optimization methods on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
