Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning
Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, Alexey, Naumov

TL;DR
This paper develops a Gaussian approximation and bootstrap method for Polyak-Ruppert averaged stochastic approximation, providing finite-sample bounds and confidence intervals with applications to TD learning.
Contribution
It introduces a Berry-Esseen bound and a multiplier bootstrap approach for LSA, enabling accurate finite-sample inference in reinforcement learning.
Findings
Berry-Esseen bound for multivariate normal approximation
Non-asymptotic validity of bootstrap confidence intervals
Application to temporal difference learning
Abstract
In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Target Tracking and Data Fusion in Sensor Networks · Distributed Sensor Networks and Detection Algorithms
MethodsSparse Evolutionary Training
