Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation

Marina Sheshukova; Denis Belomestny; Alain Durmus; Eric Moulines; Alexey Naumov; Sergey Samsonov

arXiv:2410.05106·math.OC·August 8, 2025

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation

Marina Sheshukova, Denis Belomestny, Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov

PDF

Open Access

TL;DR

This paper provides a detailed nonasymptotic analysis of stochastic gradient descent with Richardson-Romberg extrapolation, revealing explicit error bounds and higher-order moment estimates for strongly convex optimization.

Contribution

It extends prior work by deriving precise mean-squared error expansions and higher-order bounds for SGD with Richardson-Romberg extrapolation, using Markov chain techniques.

Findings

01

Root mean-squared error decomposes into leading and second-order terms.

02

Explicit dependence on minimax-optimal asymptotic covariance matrix.

03

Higher-order moment bounds are established.

Abstract

We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$ . We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order $O (n^{- 1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $O (n^{- 3/4})$ , where the power $3/4$ is best known. We also extend this result to the higher-order moment bounds.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGas Dynamics and Kinetic Theory

MethodsStochastic Gradient Descent