Towards Parameter-Free Temporal Difference Learning
Yunxiang Li, Mark Schmidt, Reza Babanezhad, Sharan Vaswani

TL;DR
This paper introduces a parameter-free TD(0) algorithm with exponential step-size schedules that achieves optimal convergence rates in both i.i.d. and Markovian sampling regimes without requiring problem-dependent parameters or impractical modifications.
Contribution
It proposes a new parameter-free TD(0) method with exponential step-sizes that works under realistic sampling conditions and removes the need for prior knowledge of problem-specific quantities.
Findings
Achieves optimal bias-variance trade-off in i.i.d. setting.
Converges at a rate comparable to prior methods in Markovian setting.
Does not require projections, iterate averaging, or knowledge of mixing time.
Abstract
Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However, they often require setting the algorithm parameters using problem-dependent quantities that are difficult to estimate in practice -- such as the minimum eigenvalue of the feature covariance (\(\omega\)) or the mixing time of the underlying Markov chain (\(\tau_{\text{mix}}\)). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
