Finite-Sample Analysis of Nonlinear Stochastic Approximation with   Applications in Reinforcement Learning

Zaiwei Chen; Sheng Zhang; Thinh T. Doan; John-Paul Clarke; Siva Theja; Maguluri

arXiv:1905.11425·math.OC·January 27, 2022·31 cites

Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement Learning

Zaiwei Chen, Sheng Zhang, Thinh T. Doan, John-Paul Clarke, Siva Theja, Maguluri

PDF

Open Access 1 Repo

TL;DR

This paper provides finite-sample convergence analysis for nonlinear stochastic approximation algorithms with Markovian noise, including applications to reinforcement learning methods like Q-learning, demonstrating exponential and sublinear convergence rates.

Contribution

It establishes finite-sample bounds for nonlinear SA with Markovian noise, applicable to RL algorithms without requiring i.i.d. samples or projection steps.

Findings

01

Constant stepsize yields exponential convergence to a neighborhood.

02

Diminishing stepsize achieves O(log(k)/k) convergence rate.

03

Numerical results support theoretical convergence bounds.

Abstract

Motivated by applications in reinforcement learning (RL), we study a nonlinear stochastic approximation (SA) algorithm under Markovian noise, and establish its finite-sample convergence bounds under various stepsizes. Specifically, we show that when using constant stepsize (i.e., $α_{k} \equiv α$ ), the algorithm achieves exponential fast convergence to a neighborhood (with radius $O (α lo g (1/ α))$ ) around the desired limit point. When using diminishing stepsizes with appropriate decay rate, the algorithm converges with rate $O (lo g (k) / k)$ . Our proof is based on Lyapunov drift arguments, and to handle the Markovian noise, we exploit the fast mixing of the underlying Markov chain. To demonstrate the generality of our theoretical results on Markovian SA, we use it to derive the finite-sample bounds of the popular $Q$ -learning with linear function approximation algorithm,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gt-coar/Q-Learning-LFA
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural dynamics and brain function

MethodsQ-Learning