Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Xiangru Lian; Yijun Huang; Yuncheng Li; Ji Liu

arXiv:1506.08272·math.OC·April 22, 2019·214 cites

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Xiangru Lian, Yijun Huang, Yuncheng Li, Ji Liu

PDF

Open Access

TL;DR

This paper provides theoretical analysis of asynchronous parallel stochastic gradient methods for nonconvex optimization, establishing convergence rates and conditions for linear speedup in deep learning contexts.

Contribution

It introduces two asynchronous parallel SG algorithms for nonconvex problems and proves their convergence and speedup properties, filling theoretical gaps.

Findings

01

Ergodic convergence rate of O(1/√K) for both algorithms

02

Linear speedup achievable when number of workers ≤ √K

03

Generalizes existing convex optimization analysis

Abstract

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is on the computer network and the other is on the shared memory system. We establish an ergodic convergence rate $O (1/ K)$ for both algorithms and prove that the linear speedup is achievable if the number of workers is bounded by $K$ ( $K$ is the total number of iterations). Our results generalize and improve existing analysis for convex minimization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM