Modified Equations for Stochastic Optimization
Stefan Perko

TL;DR
This thesis advances the theory of stochastic modified equations for optimization algorithms, introduces a novel diffusion approximation for SGD without replacement, and analyzes their convergence and scaling limits.
Contribution
It extends SME theory to time-inhomogeneous SDEs, applies it to SGD, and introduces epoched Brownian motions for finite-data SGD analysis with convergence results.
Findings
Proves weak approximation properties for certain SDEs driven by Brownian motion.
Develops a continuous-time model for SGDo using Young differential equations driven by EBMs.
Establishes weak convergence of random walks to Gaussian processes determined by permutons.
Abstract
In this thesis, we extend the recently introduced theory of stochastic modified equations (SMEs) for stochastic gradient optimization algorithms. In Ch. 3 we study time-inhomogeneous SDEs driven by Brownian motion. For certain SDEs we prove a 1st and 2nd-order weak approximation properties, and we compute their linear error terms explicitly, under certain regularity conditions. In Ch. 4 we instantiate our results for SGD, working out the example of linear regression explicitly. We use this example to compare the linear error terms of gradient flow and two commonly used 1st-order SMEs for SGD in Ch. 5. In the second part of the thesis we introduce and study a novel diffusion approximation for SGD without replacement (SGDo) in the finite-data setting. In Ch. 6 we motivate and define the notion of an epoched Brownian motion (EBM). We argue that Young differential equations (YDEs)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Stochastic processes and financial applications · Gaussian Processes and Bayesian Inference
