Stochastic Approximation with Markov Noise: Analysis and applications in   reinforcement learning

Prasenjit Karmakar

arXiv:2012.00805·cs.LG·December 3, 2020

Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning

Prasenjit Karmakar

PDF

Open Access

TL;DR

This paper develops a novel asymptotic convergence analysis for two time-scale stochastic approximation algorithms influenced by controlled Markov noise, with applications to reinforcement learning and policy evaluation.

Contribution

It introduces a new framework for analyzing stochastic approximation with Markov noise, including convergence proofs and error bounds for policy evaluation in reinforcement learning.

Findings

01

Proves almost sure convergence of stochastic approximation algorithms with controlled Markov noise.

02

Provides error bounds for policy evaluation with risk-sensitive cost functions.

03

Extends lock-in probability analysis to non-stable iterates in stochastic approximation.

Abstract

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by "controlled" Markov noise. In particular, the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Using a special case of our results, we present a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation. We compile several aspects of the dynamics of stochastic approximation algorithms with Markov iterate-dependent noise when the iterates are not known to be stable beforehand. We achieve the same by extending…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Simulation Techniques and Applications