Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation
Marina Sheshukova, Denis Belomestny, Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov

TL;DR
This paper provides a detailed nonasymptotic analysis of stochastic gradient descent with Richardson-Romberg extrapolation, revealing explicit error bounds and higher-order moment estimates for strongly convex optimization.
Contribution
It extends prior work by deriving precise mean-squared error expansions and higher-order bounds for SGD with Richardson-Romberg extrapolation, using Markov chain techniques.
Findings
Root mean-squared error decomposes into leading and second-order terms.
Explicit dependence on minimax-optimal asymptotic covariance matrix.
Higher-order moment bounds are established.
Abstract
We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations . We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order , where the power is best known. We also extend this result to the higher-order moment bounds.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGas Dynamics and Kinetic Theory
MethodsStochastic Gradient Descent
