Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic   Rates of Martingale CLT

Andreas Anastasiou; Krishnakumar Balasubramanian; Murat A. Erdogdu

arXiv:1904.02130·math.ST·April 4, 2019·COLT·5 cites

Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT

Andreas Anastasiou, Krishnakumar Balasubramanian, Murat A. Erdogdu

PDF

Open Access

TL;DR

This paper establishes explicit non-asymptotic convergence rates for the normal approximation of averaged stochastic gradient descent (SGD) using a novel martingale CLT, enabling more accurate confidence intervals and hypothesis tests.

Contribution

It introduces a non-asymptotic martingale CLT with explicit rates and applies it to analyze the convergence of averaged SGD to a normal distribution.

Findings

01

Explicit non-asymptotic rates for multivariate martingale CLT.

02

Convergence rates for averaged SGD to a normal distribution.

03

Implications for confidence intervals and hypothesis testing with SGD.

Abstract

We provide non-asymptotic convergence rates of the Polyak-Ruppert averaged stochastic gradient descent (SGD) to a normal random vector for a class of twice-differentiable test functions. A crucial intermediate step is proving a non-asymptotic martingale central limit theorem (CLT), i.e., establishing the rates of convergence of a multivariate martingale difference sequence to a normal random vector, which might be of independent interest. We obtain the explicit rates for the multivariate martingale CLT using a combination of Stein's method and Lindeberg's argument, which is then used in conjunction with a non-asymptotic analysis of averaged SGD proposed in [PJ92]. Our results have potentially interesting consequences for computing confidence intervals for parameter estimation with SGD and constructing hypothesis tests with SGD that are valid in a non-asymptotic sense.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Random Matrices and Applications

MethodsStochastic Gradient Descent