Nonasymptotic CLT and Error Bounds for Two-Time-Scale Stochastic Approximation
Seo Taek Kong, Sihan Zeng, Thinh T. Doan, R. Srikant

TL;DR
This paper establishes a nonasymptotic central limit theorem for two-time-scale stochastic approximation algorithms, demonstrating that Polyak-Ruppert averaging achieves a $1/\sqrt{n}$ error rate, improving finite-time error bounds in machine learning contexts.
Contribution
It provides the first nonasymptotic CLT with Wasserstein-1 distance for two-time-scale algorithms, showing optimal $1/\sqrt{n}$ error decay for Polyak-Ruppert averaging.
Findings
Expected error decays at rate $1/\sqrt{n}$ with Polyak-Ruppert averaging.
Finite-time error bounds are significantly improved over prior results.
First nonasymptotic CLT for two-time-scale stochastic approximation.
Abstract
We consider linear two-time-scale stochastic approximation algorithms driven by martingale noise. Recent applications in machine learning motivate the need to understand finite-time error rates, but conventional stochastic approximation analysis focus on either asymptotic convergence in distribution or finite-time bounds that are far from optimal. Prior work on asymptotic central limit theorems (CLTs) suggest that two-time-scale algorithms may be able to achieve error in expectation, with a constant given by the expected norm of the limiting Gaussian vector. However, the best known finite-time rates are much slower. We derive the first nonasymptotic central limit theorem with respect to the Wasserstein-1 distance for two-time-scale stochastic approximation with Polyak-Ruppert averaging. As a corollary, we show that expected error achieved by Polyak-Ruppert averaging decays…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Statistical Methods and Inference · Stochastic Gradient Optimization Techniques
