Almost Sure Saddle Avoidance of Stochastic Gradient Methods without the Bounded Gradient Assumption
Jun Liu, Ye Yuan

TL;DR
This paper proves that several stochastic gradient methods, including SGD, SHB, and SNAG, almost surely avoid strict saddle points without requiring bounded gradients, under more practical assumptions relevant to neural network training.
Contribution
It establishes almost sure saddle avoidance for SHB and SNAG, and extends analysis of SGD by removing bounded gradient and noise assumptions.
Findings
SGD, SHB, and SNAG almost surely avoid strict saddle points.
Introduces a local boundedness assumption for noisy gradients.
Results are applicable to neural network training scenarios.
Abstract
We prove that various stochastic gradient descent methods, including the stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict saddle manifold. To the best of our knowledge, this is the first time such results are obtained for SHB and SNAG methods. Moreover, our analysis expands upon previous studies on SGD by removing the need for bounded gradients of the objective function and uniformly bounded noise. Instead, we introduce a more practical local boundedness assumption for the noisy gradient, which is naturally satisfied in empirical risk minimization problems typically seen in training of neural networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
