Asynchronous Stochastic Gradient Descent with Variance Reduction for   Non-Convex Optimization

Zhouyuan Huo; Heng Huang

arXiv:1604.03584·cs.LG·December 21, 2016·23 cites

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

Zhouyuan Huo, Heng Huang

PDF

Open Access

TL;DR

This paper presents the first theoretical analysis of asynchronous stochastic variance reduced gradient algorithms for non-convex optimization, demonstrating convergence rates and linear speedup potential in distributed and shared memory systems.

Contribution

It provides the first convergence analysis of asynchronous SVRG algorithms for non-convex problems, showing linear convergence rates and scalability in parallel systems.

Findings

01

Both algorithms achieve an $O(1/T)$ convergence rate.

02

Linear speedup is possible with a bounded number of workers.

03

First theoretical analysis of asynchronous SVRG on non-convex optimization.

Abstract

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asynchronous SGD with variance reduction technique on non-convex problem. In this paper, we study two asynchronous parallel implementations of SVRG: one is on a distributed memory system and the other is on a shared memory system. We provide the theoretical analysis that both algorithms can obtain a convergence rate of $O (1/ T)$ , and linear speed up is achievable if the number of workers is upper bounded. V1,v2,v3 have been withdrawn due to reference issue, please refer the newest version v4.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent