Towards Parameter-Free Temporal Difference Learning

Yunxiang Li; Mark Schmidt; Reza Babanezhad; Sharan Vaswani

arXiv:2603.02577·cs.LG·March 4, 2026

Towards Parameter-Free Temporal Difference Learning

Yunxiang Li, Mark Schmidt, Reza Babanezhad, Sharan Vaswani

PDF

Open Access

TL;DR

This paper introduces a parameter-free TD(0) algorithm with exponential step-size schedules that achieves optimal convergence rates in both i.i.d. and Markovian sampling regimes without requiring problem-dependent parameters or impractical modifications.

Contribution

It proposes a new parameter-free TD(0) method with exponential step-sizes that works under realistic sampling conditions and removes the need for prior knowledge of problem-specific quantities.

Findings

01

Achieves optimal bias-variance trade-off in i.i.d. setting.

02

Converges at a rate comparable to prior methods in Markovian setting.

03

Does not require projections, iterate averaging, or knowledge of mixing time.

Abstract

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However, they often require setting the algorithm parameters using problem-dependent quantities that are difficult to estimate in practice -- such as the minimum eigenvalue of the feature covariance (\(\omega\)) or the mixing time of the underlying Markov chain (\(\tau_{\text{mix}}\)). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms