Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement Learning
Zaiwei Chen, Sheng Zhang, Thinh T. Doan, John-Paul Clarke, Siva Theja, Maguluri

TL;DR
This paper provides finite-sample convergence analysis for nonlinear stochastic approximation algorithms with Markovian noise, including applications to reinforcement learning methods like Q-learning, demonstrating exponential and sublinear convergence rates.
Contribution
It establishes finite-sample bounds for nonlinear SA with Markovian noise, applicable to RL algorithms without requiring i.i.d. samples or projection steps.
Findings
Constant stepsize yields exponential convergence to a neighborhood.
Diminishing stepsize achieves O(log(k)/k) convergence rate.
Numerical results support theoretical convergence bounds.
Abstract
Motivated by applications in reinforcement learning (RL), we study a nonlinear stochastic approximation (SA) algorithm under Markovian noise, and establish its finite-sample convergence bounds under various stepsizes. Specifically, we show that when using constant stepsize (i.e., ), the algorithm achieves exponential fast convergence to a neighborhood (with radius ) around the desired limit point. When using diminishing stepsizes with appropriate decay rate, the algorithm converges with rate . Our proof is based on Lyapunov drift arguments, and to handle the Markovian noise, we exploit the fast mixing of the underlying Markov chain. To demonstrate the generality of our theoretical results on Markovian SA, we use it to derive the finite-sample bounds of the popular -learning with linear function approximation algorithm,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural dynamics and brain function
MethodsQ-Learning
