Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis
Stephan Wojtowytsch

TL;DR
This paper analyzes stochastic gradient descent (SGD) with machine learning noise, showing conditions under which it converges rapidly to the global minimum, especially in overparametrized deep learning landscapes.
Contribution
It provides a discrete-time analysis of SGD with realistic noise models, establishing convergence rates and conditions for global optimality in complex energy landscapes.
Findings
SGD can have a uniformly positive learning rate in certain landscapes.
Exponential convergence to the global minimum is possible under Lojasiewicz inequality.
Almost sure convergence occurs even with local minima, from finite energy initializations.
Abstract
Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine learning. The noise encountered in these applications is different from that in many theoretical analyses of stochastic gradient algorithms. In this article, we discuss some of the common properties of energy landscapes and stochastic noise encountered in machine learning problems, and how they affect SGD-based optimization. In particular, we show that the learning rate in SGD with machine learning noise can be chosen to be small, but uniformly positive for all times if the energy landscape resembles that of overparametrized deep learning problems. If the objective function satisfies a Lojasiewicz inequality, SGD converges to the global minimum exponentially fast, and even for functions which may have local minima, we establish almost sure convergence to the global minimum at an exponential rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Statistical Methods and Inference
MethodsStochastic Gradient Descent
