Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT
Andreas Anastasiou, Krishnakumar Balasubramanian, Murat A. Erdogdu

TL;DR
This paper establishes explicit non-asymptotic convergence rates for the normal approximation of averaged stochastic gradient descent (SGD) using a novel martingale CLT, enabling more accurate confidence intervals and hypothesis tests.
Contribution
It introduces a non-asymptotic martingale CLT with explicit rates and applies it to analyze the convergence of averaged SGD to a normal distribution.
Findings
Explicit non-asymptotic rates for multivariate martingale CLT.
Convergence rates for averaged SGD to a normal distribution.
Implications for confidence intervals and hypothesis testing with SGD.
Abstract
We provide non-asymptotic convergence rates of the Polyak-Ruppert averaged stochastic gradient descent (SGD) to a normal random vector for a class of twice-differentiable test functions. A crucial intermediate step is proving a non-asymptotic martingale central limit theorem (CLT), i.e., establishing the rates of convergence of a multivariate martingale difference sequence to a normal random vector, which might be of independent interest. We obtain the explicit rates for the multivariate martingale CLT using a combination of Stein's method and Lindeberg's argument, which is then used in conjunction with a non-asymptotic analysis of averaged SGD proposed in [PJ92]. Our results have potentially interesting consequences for computing confidence intervals for parameter estimation with SGD and constructing hypothesis tests with SGD that are valid in a non-asymptotic sense.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Random Matrices and Applications
MethodsStochastic Gradient Descent
