Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem
Shaan Ul Haque, Siva Theja Maguluri

TL;DR
This paper introduces a general theorem for stochastic approximation under unbounded Markovian noise, enabling finite-time analysis of reinforcement learning and optimization algorithms with broad applicability.
Contribution
It provides a versatile black-box theorem for non-linear stochastic approximation with unbounded Markovian noise, extending finite-time bounds to new settings.
Findings
Finite-time bounds for TD learning with linear function approximation.
Improved finite-time bounds for Q-learning with broader policy classes.
First finite-time bounds for distributed stochastic optimization with cyclic block coordinate descent.
Abstract
Motivated by engineering applications such as resource allocation in networks and inventory systems, we consider average-reward Reinforcement Learning with unbounded state space and reward function. Recent works studied this problem in the actor-critic framework and established finite sample bounds assuming access to a critic with certain error guarantees. We complement their work by studying Temporal Difference (TD) learning with linear function approximation and establishing finite-time bounds with the optimal sample complexity. These results are obtained using the following general-purpose theorem for non-linear Stochastic Approximation (SA). Suppose that one constructs a Lyapunov function for a non-linear SA with certain drift condition. Then, our theorem establishes finite-time bounds when this SA is driven by unbounded Markovian noise under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Probabilistic and Robust Engineering Design
