On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
Panayotis Mertikopoulos, Nadav Hallak, Ali Kavis, Volkan, Cevher

TL;DR
This paper provides a comprehensive analysis of stochastic gradient descent (SGD) in non-convex problems, establishing almost sure convergence, avoidance of saddle points, and convergence rates, with practical insights for step-size tuning.
Contribution
It proves almost sure convergence of SGD in non-convex settings, shows SGD avoids saddle points with probability 1, and derives convergence rates for Hurwicz minimizers under various step-size schedules.
Findings
SGD iterates remain bounded and converge with probability 1.
SGD avoids strict saddle points with probability 1.
Convergence rate to Hurwicz minimizers is O(1/n^p) with step-size Θ(1/n^p).
Abstract
This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is if the method is employed with a step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs
Methods1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Average Pooling · Max Pooling · Global Average Pooling · Residual Connection · Kaiming Initialization · Convolution
