Convergence rates and approximation results for SGD and its   continuous-time counterpart

Xavier Fontaine; Valentin De Bortoli; and Alain Durmus

arXiv:2004.04193·math.OC·February 2, 2021·COLT·1 cites

Convergence rates and approximation results for SGD and its continuous-time counterpart

Xavier Fontaine, Valentin De Bortoli, and Alain Durmus

PDF

Open Access

TL;DR

This paper provides a comprehensive theoretical analysis of SGD with decreasing step sizes, approximating it with stochastic differential equations and establishing convergence bounds, including for non-convex functions.

Contribution

It introduces new approximation techniques for SGD using SDEs, improves non-asymptotic bounds under weaker assumptions, and extends convergence results to non-convex settings.

Findings

01

SGD can be approximated by solutions of a time inhomogeneous SDE.

02

New comparison techniques for analyzing continuous processes.

03

Improved non-asymptotic bounds for convex SGD under weaker assumptions.

Abstract

This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Stein's method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Random Matrices and Applications · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent