Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation
Anupama Sridhar, Alexander Johansen

TL;DR
This paper provides the first finite-sample, high-probability analysis of vanilla TD(0) with nonlinear function approximation under polynomial mixing, showing convergence rates comparable to i.i.d. data without requiring projections or subsampling.
Contribution
It introduces a novel analysis technique for nonlinear TD(0) under polynomial mixing, removing previous restrictions like subsampling and projections.
Findings
Convergence rates match i.i.d. scenarios.
High-probability bounds hold for nonstationary initialization.
Novel coupling method bypasses geometric ergodicity.
Abstract
Temporal Difference Learning (TD(0)) is fundamental in reinforcement learning, yet its finite-sample behavior under non-i.i.d. data and nonlinear approximation remains unknown. We provide the first high-probability, finite-sample analysis of vanilla TD(0) on polynomially mixing Markov data, assuming only Holder continuity and bounded generalized gradients. This breaks with previous work, which often requires subsampling, projections, or instance-dependent step-sizes. Concretely, for mixing exponent , Holder continuity exponent , and step-size decay rate , we show that, with high probability, \[ \| \theta_t - \theta^* \| \leq C(\beta, \gamma, \eta)\, t^{-\beta/2} + C'(\gamma, \eta)\, t^{-\eta\gamma} \] after iterations. These bounds match the known i.i.d. rates and hold even when initialization is nonstationary.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Algorithms · Control Systems and Identification
