A Multistep Lyapunov Approach for Finite-Time Analysis of Biased   Stochastic Approximation

Gang Wang; Bingcong Li; Georgios B. Giannakis

arXiv:1909.04299·stat.ML·September 2, 2020·25 cites

A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

Gang Wang, Bingcong Li, Georgios B. Giannakis

PDF

Open Access

TL;DR

This paper introduces a multistep Lyapunov approach for finite-time analysis of biased stochastic approximation algorithms, providing the first non-asymptotic error bounds for unmodified TD and Q-learning with linear function approximation under general conditions.

Contribution

It develops a novel multistep Lyapunov framework that enables finite-time analysis of biased stochastic approximation algorithms in reinforcement learning.

Findings

01

First finite-time error bounds for unmodified TD and Q-learning.

02

Applicable under general Markov chain mixing conditions.

03

Works with nonlinear function approximators from any initial distribution.

Abstract

Motivated by the widespread use of temporal-difference (TD-) and Q-learning algorithms in reinforcement learning, this paper studies a class of biased stochastic approximation (SA) procedures under a mild "ergodic-like" assumption on the underlying stochastic noise sequence. Building upon a carefully designed multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (for control of the gradient bias), we prove a general result on the convergence of the iterates, and use it to derive non-asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel looking-ahead viewpoint renders finite-time analysis of biased SA algorithms under a large family of stochastic perturbations possible. For direct comparison with existing contributions, we also demonstrate these bounds by applying them to TD- and Q-learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural dynamics and brain function

MethodsQ-Learning