Convergence of Gradient Algorithms for Nonconvex C^{1+alpha} Cost   Functions

Zixuan Wang; Shanjian Tang

arXiv:2012.00628·math.OC·October 1, 2021

Convergence of Gradient Algorithms for Nonconvex C^{1+alpha} Cost Functions

Zixuan Wang, Shanjian Tang

PDF

Open Access

TL;DR

This paper analyzes the convergence of various stochastic gradient algorithms with momentum in nonconvex settings, relaxing previous assumptions and extending results to stochastic stepsizes.

Contribution

It provides a unified convergence analysis for momentum-based stochastic gradient methods under mild conditions, including Holder continuity of gradients and stochastic stepsizes.

Findings

01

Proves almost sure convergence without additional restrictions

02

Extends convergence results to Holder continuous gradients

03

Includes stochastic stepsizes in the convergence analysis

Abstract

This paper is concerned with convergence of stochastic gradient algorithms with momentum terms in the nonconvex setting. A class of stochastic momentum methods, including stochastic gradient descent, heavy ball, and Nesterov's accelerated gradient, is analyzed in a general framework under mild assumptions. Based on the convergence result of expected gradients, we prove the almost sure convergence by a detailed discussion of the effects of momentum and the number of upcrossings. It is worth noting that there are not additional restrictions imposed on the objective function and stepsize. Another improvement over previous results is that the existing Lipschitz condition of the gradient is relaxed into the condition of Holder continuity. As a byproduct, we apply a localization procedure to extend our results to stochastic stepsizes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Optimization and Variational Analysis