Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise
Ethan Blaser, Shangtong Zhang

TL;DR
This paper analyzes the convergence properties of nonexpansive stochastic approximation algorithms with Markovian noise, extending understanding beyond contractive operators, and applies findings to average reward temporal difference learning.
Contribution
It provides the first finite sample and asymptotic analysis for nonexpansive stochastic approximations with Markovian noise, including convergence of average reward TD learning.
Findings
Established novel bounds for noise terms using the Poisson equation.
Proved convergence of classical tabular average reward TD learning to a sample-path dependent fixed point.
Extended stochastic approximation analysis to nonexpansive operators in reinforcement learning.
Abstract
Stochastic approximation is a powerful class of algorithms with celebrated success. However, a large body of previous analysis focuses on stochastic approximations driven by contractive operators, which is not applicable in some important reinforcement learning settings like the average reward setting. This work instead investigates stochastic approximations with merely nonexpansive operators. In particular, we study nonexpansive stochastic approximations with Markovian noise, providing both asymptotic and finite sample analysis. Key to our analysis are novel bounds of noise terms resulting from the Poisson equation. As an application, we prove for the first time that classical tabular average reward temporal difference learning converges to a sample-path dependent fixed point.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImbalanced Data Classification Techniques · Sentiment Analysis and Opinion Mining · Educational and Technological Research
