Error analysis for stochastic gradient optimization schemes using modified equations

Charles-Edouard Br\'ehier; Marc Dambrine; Nassim En-Nebbazi

arXiv:2411.05538·math.NA·January 27, 2026

Error analysis for stochastic gradient optimization schemes using modified equations

Charles-Edouard Br\'ehier, Marc Dambrine, Nassim En-Nebbazi

PDF

TL;DR

This paper develops a theoretical framework using modified equations to analyze the long-term error behavior of stochastic gradient schemes, providing uniform error estimates over time and insights into their complexity.

Contribution

It introduces a novel approach to error analysis of stochastic gradient methods using modified differential equations, extending results to infinite time horizons.

Findings

01

Uniform error estimates in time for stochastic gradient schemes

02

Second-order modified equations incorporate stochastic effects

03

Numerical experiments confirm theoretical convergence

Abstract

We consider a class of stochastic gradient optimization schemes. Assuming that the objective function is strongly convex, we prove weak error estimates which are uniform in time for the error between the solution of the numerical scheme, and the solutions of continuous-time modified (or high-resolution) differential equations at first and second orders, with respect to the time-step size. At first order, the modified equation is deterministic, whereas at second order the modified equation is stochastic and depends on a modified objective function. We go beyond existing results where the error estimates have been considered only on finite time intervals and were not uniform in time. This allows us to then provide a rigorous complexity analysis of the method in the large time and small time-step size regimes. We provide numerical experiments to illustrate the convergence results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.