Convergence of stochastic gradient descent under a local Lojasiewicz   condition for deep neural networks

Jing An; Jianfeng Lu

arXiv:2304.09221·cs.LG·January 17, 2024·1 cites

Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

Jing An, Jianfeng Lu

PDF

Open Access

TL;DR

This paper proves that stochastic gradient descent can converge locally for certain neural networks under a specific mathematical condition called the local Lojasiewicz condition, with positive probability.

Contribution

It establishes local convergence of SGD under the local Lojasiewicz condition and provides examples of neural networks satisfying these assumptions.

Findings

01

SGD converges locally with positive probability under the local Lojasiewicz condition.

02

The paper introduces a local structural assumption of the loss landscape.

03

Examples of neural networks satisfying the assumptions are provided.

Abstract

We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local \L{}ojasiewicz condition introduced by Chatterjee in \cite{chatterjee2022convergence} and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Statistical Methods and Inference

MethodsStochastic Gradient Descent