On the Stability of Random Matrix Product with Markovian Noise:   Application to Linear Stochastic Approximation and TD Learning

Alain Durmus; Eric Moulines; Alexey Naumov; Sergey Samsonov; Hoi-To; Wai

arXiv:2102.00185·stat.ML·February 2, 2021·5 cites

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Hoi-To, Wai

PDF

Open Access

TL;DR

This paper establishes exponential stability for products of random matrices driven by Markov chains, enabling finite-time bounds for stochastic approximation and TD learning algorithms in reinforcement learning.

Contribution

It introduces a novel exponential stability result under weaker conditions, extending analysis to unbounded state spaces and Markovian noise.

Findings

01

Finite-time $p$-th moment bounds for stochastic approximation schemes.

02

Stability results applicable to general state space Markov chains.

03

Finite-time bounds for TD learning algorithms in reinforcement learning.

Abstract

This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions and uniform ergodicity of the Markov chains. Our main contribution is an exponential stability result for the $p$ -th moment of random matrix product, provided that (i) the underlying Markov chain satisfies a super-Lyapunov drift condition, (ii) the growth of the matrix-valued functions is controlled by an appropriately defined function (related to the drift condition). Using this result, we give finite-time $p$ -th moment bounds for constant and decreasing stepsize linear stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization