Almost Sure Saddle Avoidance of Stochastic Gradient Methods without the   Bounded Gradient Assumption

Jun Liu; Ye Yuan

arXiv:2302.07862·cs.LG·February 16, 2023

Almost Sure Saddle Avoidance of Stochastic Gradient Methods without the Bounded Gradient Assumption

Jun Liu, Ye Yuan

PDF

Open Access

TL;DR

This paper proves that several stochastic gradient methods, including SGD, SHB, and SNAG, almost surely avoid strict saddle points without requiring bounded gradients, under more practical assumptions relevant to neural network training.

Contribution

It establishes almost sure saddle avoidance for SHB and SNAG, and extends analysis of SGD by removing bounded gradient and noise assumptions.

Findings

01

SGD, SHB, and SNAG almost surely avoid strict saddle points.

02

Introduces a local boundedness assumption for noisy gradients.

03

Results are applicable to neural network training scenarios.

Abstract

We prove that various stochastic gradient descent methods, including the stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict saddle manifold. To the best of our knowledge, this is the first time such results are obtained for SHB and SNAG methods. Moreover, our analysis expands upon previous studies on SGD by removing the need for bounded gradients of the objective function and uniformly bounded noise. Instead, we introduce a more practical local boundedness assumption for the noisy gradient, which is naturally satisfied in empirical risk minimization problems typically seen in training of neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent