Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Xiangru Lian, Yijun Huang, Yuncheng Li, Ji Liu

TL;DR
This paper provides theoretical analysis of asynchronous parallel stochastic gradient methods for nonconvex optimization, establishing convergence rates and conditions for linear speedup in deep learning contexts.
Contribution
It introduces two asynchronous parallel SG algorithms for nonconvex problems and proves their convergence and speedup properties, filling theoretical gaps.
Findings
Ergodic convergence rate of O(1/√K) for both algorithms
Linear speedup achievable when number of workers ≤ √K
Generalizes existing convex optimization analysis
Abstract
Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is on the computer network and the other is on the shared memory system. We establish an ergodic convergence rate for both algorithms and prove that the linear speedup is achievable if the number of workers is bounded by ( is the total number of iterations). Our results generalize and improve existing analysis for convex minimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
