Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation

Han-Dong Lim; Donghwan Lee

arXiv:2502.08941·cs.LG·February 24, 2026

Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation

Han-Dong Lim, Donghwan Lee

PDF

Open Access

TL;DR

This paper investigates the convergence properties of off-policy n-step TD-learning algorithms with linear function approximation, demonstrating convergence as the sampling horizon increases, and bridges model-based and model-free RL methods.

Contribution

It provides a theoretical analysis showing convergence of n-step TD algorithms in off-policy settings as n grows large, linking deterministic and stochastic approaches.

Findings

01

n-step TD algorithms converge with sufficiently large n

02

Analysis connects model-based and model-free RL methods

03

Provides theoretical guarantees for off-policy TD learning

Abstract

This paper analyzes multi-step temporal difference (TD)-learning algorithms within the ``deadly triad'' scenario, characterized by linear function approximation, off-policy learning, and bootstrapping. In particular, we prove that $n$ -step TD-learning algorithms converge to a solution as the sampling horizon $n$ increases sufficiently. The paper is divided into two parts. In the first part, we comprehensively examine the fundamental properties of their model-based deterministic counterparts, including projected value iteration, gradient descent algorithms, which can be viewed as prototype deterministic algorithms whose analysis plays a pivotal role in understanding and developing their model-free reinforcement learning counterparts. In particular, we prove that these algorithms converge to meaningful solutions when $n$ is sufficiently large. Based on these findings, in the second part,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics