Convergence of Gradient Algorithms for Nonconvex C^{1+alpha} Cost Functions
Zixuan Wang, Shanjian Tang

TL;DR
This paper analyzes the convergence of various stochastic gradient algorithms with momentum in nonconvex settings, relaxing previous assumptions and extending results to stochastic stepsizes.
Contribution
It provides a unified convergence analysis for momentum-based stochastic gradient methods under mild conditions, including Holder continuity of gradients and stochastic stepsizes.
Findings
Proves almost sure convergence without additional restrictions
Extends convergence results to Holder continuous gradients
Includes stochastic stepsizes in the convergence analysis
Abstract
This paper is concerned with convergence of stochastic gradient algorithms with momentum terms in the nonconvex setting. A class of stochastic momentum methods, including stochastic gradient descent, heavy ball, and Nesterov's accelerated gradient, is analyzed in a general framework under mild assumptions. Based on the convergence result of expected gradients, we prove the almost sure convergence by a detailed discussion of the effects of momentum and the number of upcrossings. It is worth noting that there are not additional restrictions imposed on the objective function and stepsize. Another improvement over previous results is that the existing Lipschitz condition of the gradient is relaxed into the condition of Holder continuity. As a byproduct, we apply a localization procedure to extend our results to stochastic stepsizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Optimization and Variational Analysis
