Convergence rates and approximation results for SGD and its continuous-time counterpart
Xavier Fontaine, Valentin De Bortoli, and Alain Durmus

TL;DR
This paper provides a comprehensive theoretical analysis of SGD with decreasing step sizes, approximating it with stochastic differential equations and establishing convergence bounds, including for non-convex functions.
Contribution
It introduces new approximation techniques for SGD using SDEs, improves non-asymptotic bounds under weaker assumptions, and extends convergence results to non-convex settings.
Findings
SGD can be approximated by solutions of a time inhomogeneous SDE.
New comparison techniques for analyzing continuous processes.
Improved non-asymptotic bounds for convex SGD under weaker assumptions.
Abstract
This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Stein's method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Random Matrices and Applications · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
