A Concentration Bound for TD(0) with Function Approximation
Siddharth Chandak, Vivek S. Borkar

TL;DR
This paper establishes a uniform concentration bound for TD(0) with linear function approximation in an online setting, addressing challenges from Markov noise and lack of boundedness guarantees.
Contribution
It provides the first uniform all-time concentration bound for TD(0) with function approximation in an online Markov setting.
Findings
Derived a uniform concentration bound for TD(0) with linear function approximation.
Handled Markov noise using the Poisson equation.
Addressed the lack of boundedness guarantees with relaxed concentration inequalities.
Abstract
We derive uniform all-time concentration bound of the type 'for all for some ' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Machine Learning and Algorithms · Bayesian Methods and Mixture Models
