Linear Stochastic Approximation: Constant Step-Size and Iterate   Averaging

Chandrashekar Lakshminarayanan; Csaba Szepesv\'ari

arXiv:1709.04073·cs.LG·September 14, 2017·2 cites

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

Chandrashekar Lakshminarayanan, Csaba Szepesv\'ari

PDF

Open Access

TL;DR

This paper analyzes linear stochastic approximation algorithms with constant step-size and iterate averaging, providing bounds on mean squared error decay and discussing uniform step-size selection across data distributions, with implications for reinforcement learning.

Contribution

It offers theoretical bounds for MSE decay of LSAs with PR averaging under fixed step-size, and explores conditions for uniform step-size applicability across data distributions.

Findings

01

MSE decays as O(1/t) under certain step-size conditions

02

Not all data distributions allow a uniform constant step-size

03

Heuristic algorithm for step-size tuning is proposed

Abstract

We consider $d$ -dimensional linear stochastic approximation algorithms (LSAs) with a constant step-size and the so called Polyak-Ruppert (PR) averaging of iterates. LSAs are widely applied in machine learning and reinforcement learning (RL), where the aim is to compute an appropriate $θ_{*} \in R^{d}$ (that is an optimum or a fixed point) using noisy data and $O (d)$ updates per iteration. In this paper, we are motivated by the problem (in RL) of policy evaluation from experience replay using the \emph{temporal difference} (TD) class of learning algorithms that are also LSAs. For LSAs with a constant step-size, and PR averaging, we provide bounds for the mean squared error (MSE) after $t$ iterations. We assume that data is \iid with finite variance (underlying distribution being $P$ ) and that the expected dynamics is Hurwitz. For a given LSA with PR averaging, and data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization

MethodsExperience Replay