Stochastic Subgradient Descent on a Generic Definable Function Converges to a Minimizer
Sholom Schechtman

TL;DR
This paper proves that stochastic subgradient descent almost surely converges to a local minimum when optimizing generic definable functions, extending previous results by characterizing critical points and showing avoidance of certain saddle points.
Contribution
It introduces the concept of sharply repulsive critical points for non-weakly convex functions and demonstrates that SGD avoids these points under mild perturbations, ensuring convergence to local minima.
Findings
SGD avoids sharply repulsive critical points with probability one.
Active manifolds satisfy Verdier and angle conditions.
SGD converges to a local minimum on generic definable functions.
Abstract
It was previously shown by Davis and Drusvyatskiy that every Clarke critical point of a generic, semialgebraic (and more generally definable in an o-minimal structure), weakly convex function is lying on an active manifold and is either a local minimum or an active strict saddle. In the first part of this work, we show that when the weak convexity assumption fails a third type of point appears: a sharply repulsive critical point. Moreover, we show that the corresponding active manifolds satisfy the Verdier and the angle conditions which were introduced by us in our previous work. In the second part of this work, we show that, under a density-like assumption on the perturbation sequence, the stochastic subgradient descent (SGD) avoids sharply repulsive critical points with probability one. We show that such a density-like assumption could be obtained upon adding a small random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Topology and Set Theory · Topological and Geometric Data Analysis · Computability, Logic, AI Algorithms
MethodsStochastic Gradient Descent
