Convergence of continuous-time stochastic gradient descent with applications to deep neural networks
Gabor Lugosi, Eulalia Nualart

TL;DR
This paper analyzes the convergence of a continuous-time approximation of stochastic gradient descent, providing conditions for convergence and applying these results to overparametrized neural networks.
Contribution
It extends previous convergence results to stochastic gradient descent and demonstrates applicability to deep neural network training.
Findings
Established general convergence conditions for continuous-time stochastic gradient descent
Extended convergence analysis from nonstochastic to stochastic gradient descent
Applied theoretical results to overparametrized neural network training
Abstract
We study a continuous-time approximation of the stochastic gradient descent process for minimizing the population expected loss in learning problems. The main results establish general sufficient conditions for the convergence, extending the results of Chatterjee (2022) established for (nonstochastic) gradient descent. We show how the main result can be applied to the case of overparametrized neural network training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM
