A Concentration Bound for TD(0) with Function Approximation

Siddharth Chandak; Vivek S. Borkar

arXiv:2312.10424·cs.LG·January 13, 2026·1 cites

A Concentration Bound for TD(0) with Function Approximation

Siddharth Chandak, Vivek S. Borkar

PDF

Open Access

TL;DR

This paper establishes a uniform concentration bound for TD(0) with linear function approximation in an online setting, addressing challenges from Markov noise and lack of boundedness guarantees.

Contribution

It provides the first uniform all-time concentration bound for TD(0) with function approximation in an online Markov setting.

Findings

01

Derived a uniform concentration bound for TD(0) with linear function approximation.

02

Handled Markov noise using the Poisson equation.

03

Addressed the lack of boundedness guarantees with relaxed concentration inequalities.

Abstract

We derive uniform all-time concentration bound of the type 'for all $n \geq n_{0}$ for some $n_{0}$ ' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Machine Learning and Algorithms · Bayesian Methods and Mixture Models