Error analysis for stochastic gradient optimization schemes using modified equations
Charles-Edouard Br\'ehier, Marc Dambrine, Nassim En-Nebbazi

TL;DR
This paper develops a theoretical framework using modified equations to analyze the long-term error behavior of stochastic gradient schemes, providing uniform error estimates over time and insights into their complexity.
Contribution
It introduces a novel approach to error analysis of stochastic gradient methods using modified differential equations, extending results to infinite time horizons.
Findings
Uniform error estimates in time for stochastic gradient schemes
Second-order modified equations incorporate stochastic effects
Numerical experiments confirm theoretical convergence
Abstract
We consider a class of stochastic gradient optimization schemes. Assuming that the objective function is strongly convex, we prove weak error estimates which are uniform in time for the error between the solution of the numerical scheme, and the solutions of continuous-time modified (or high-resolution) differential equations at first and second orders, with respect to the time-step size. At first order, the modified equation is deterministic, whereas at second order the modified equation is stochastic and depends on a modified objective function. We go beyond existing results where the error estimates have been considered only on finite time intervals and were not uniform in time. This allows us to then provide a rigorous complexity analysis of the method in the large time and small time-step size regimes. We provide numerical experiments to illustrate the convergence results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
