Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
Jing An, Jianfeng Lu

TL;DR
This paper proves that stochastic gradient descent can converge locally for certain neural networks under a specific mathematical condition called the local Lojasiewicz condition, with positive probability.
Contribution
It establishes local convergence of SGD under the local Lojasiewicz condition and provides examples of neural networks satisfying these assumptions.
Findings
SGD converges locally with positive probability under the local Lojasiewicz condition.
The paper introduces a local structural assumption of the loss landscape.
Examples of neural networks satisfying the assumptions are provided.
Abstract
We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local \L{}ojasiewicz condition introduced by Chatterjee in \cite{chatterjee2022convergence} and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Statistical Methods and Inference
MethodsStochastic Gradient Descent
