Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging
Chandrashekar Lakshminarayanan, Csaba Szepesv\'ari

TL;DR
This paper analyzes linear stochastic approximation algorithms with constant step-size and iterate averaging, providing bounds on mean squared error decay and discussing uniform step-size selection across data distributions, with implications for reinforcement learning.
Contribution
It offers theoretical bounds for MSE decay of LSAs with PR averaging under fixed step-size, and explores conditions for uniform step-size applicability across data distributions.
Findings
MSE decays as O(1/t) under certain step-size conditions
Not all data distributions allow a uniform constant step-size
Heuristic algorithm for step-size tuning is proposed
Abstract
We consider -dimensional linear stochastic approximation algorithms (LSAs) with a constant step-size and the so called Polyak-Ruppert (PR) averaging of iterates. LSAs are widely applied in machine learning and reinforcement learning (RL), where the aim is to compute an appropriate (that is an optimum or a fixed point) using noisy data and updates per iteration. In this paper, we are motivated by the problem (in RL) of policy evaluation from experience replay using the \emph{temporal difference} (TD) class of learning algorithms that are also LSAs. For LSAs with a constant step-size, and PR averaging, we provide bounds for the mean squared error (MSE) after iterations. We assume that data is \iid with finite variance (underlying distribution being ) and that the expected dynamics is Hurwitz. For a given LSA with PR averaging, and data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
MethodsExperience Replay
