Efficient Distributed SGD with Variance Reduction

Soham De; Tom Goldstein

arXiv:1512.02970·cs.LG·April 10, 2017·1 cites

Efficient Distributed SGD with Variance Reduction

Soham De, Tom Goldstein

PDF

Open Access

TL;DR

This paper introduces CentralVR, a distributed stochastic gradient descent method with variance reduction that scales efficiently across many workers, achieving faster convergence and better performance on large datasets.

Contribution

The paper presents CentralVR, a novel variance-reduced distributed SGD algorithm that scales linearly with the number of workers and improves convergence rates.

Findings

01

CentralVR scales linearly with the number of worker nodes.

02

CentralVR achieves provably linear convergence rates.

03

CentralVR outperforms existing methods in large-scale experiments.

Abstract

Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets. However, SGD suffers from two main drawbacks: (i) The noisy gradient updates have high variance, which slows down convergence as the iterates approach the optimum, and (ii) SGD scales poorly in distributed settings, typically experiencing rapidly decreasing marginal benefits as the number of workers increases. In this paper, we propose a highly parallel method, CentralVR, that uses error corrections to reduce the variance of SGD gradient updates, and scales linearly with the number of worker nodes. CentralVR enjoys low iteration complexity, provably linear convergence rates, and exhibits linear performance gains up to hundreds of cores for massive datasets. We compare CentralVR to state-of-the-art parallel stochastic optimization methods on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications

MethodsStochastic Gradient Descent