Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains

Maik Overmars; Jasper Goseling; Richard Boucherie

arXiv:2510.25514·stat.ML·April 2, 2026

Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains

Maik Overmars, Jasper Goseling, Richard Boucherie

PDF

TL;DR

This paper proves convergence of off-policy TD(0) with linear function approximation for reversible Markov chains, providing explicit bounds and using a modified stochastic approximation framework.

Contribution

It establishes convergence guarantees for the standard off-policy TD(0) algorithm under reversibility assumptions, improving existing results with explicit bounds.

Findings

01

Convergence with probability one and zero projected Bellman error.

02

Explicit upper bound on discount factor for convergence.

03

Application to reversible Markov chains like random walks.

Abstract

We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function approximation can lead to divergence of the algorithm. Existing results for this setting modify the algorithm, for instance by reweighing the updates using importance sampling. This establishes convergence at the expense of additional complexity. In contrast, our approach is to analyse the standard algorithm, but to restrict our attention to the class of reversible Markov chains. We demonstrate convergence under this mild reversibility condition on the structure of the chain, which in many applications can be assumed using domain knowledge. In particular, we establish a convergence guarantee under an upper bound on the discount factor in terms of the difference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.