A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections or Strong Convexity
Wei-Cheng Lee, Francesco Orabona

TL;DR
This paper provides a new finite-time convergence analysis of projection-free TD learning with linear function approximation, demonstrating it converges without artificial boundedness assumptions even under Markovian noise.
Contribution
It introduces a refined analysis showing projection-free TD learning converges with a specific rate without relying on strong convexity or artificial projections.
Findings
Convergence rate of rac{{ heta^*}||^2_2}{\u007sqrt{T}}
Convergence holds even with Markovian noise
Establishes a self-bounding property of TD updates
Abstract
We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in the field of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the minimal curvature of the potential function. While prior work has established convergence guarantees in this setting, these results typically rely on the assumption that each iterate is projected onto a bounded set, a condition that is both artificial and does not match the current practice. In this paper, we challenge the necessity of such an assumption and present a refined analysis of TD learning. For the first time, we show that the simple projection-free variant converges with a rate of , even in the presence of Markovian…
Peer Reviews
Decision·Submitted to ICLR 2026
- First robust, **projection-free** guarantee for linear TD(0). - Handles Markovian bias via mixing; clear comparison to prior work.
- **The paper clearly violates the ICLR format. Missing page numbers and the whole text has been shifted down and therefore should be desk rejected.** - Experiments are narrow; few task details in main text.
1. The authors establish non-asymptotic bounds for TD(0) with linear function approximation under Markov noise, matching practical projection-free implementations. 2. The analysis shows the iterates remain bounded in expectation without projections or strong convexity, by leveraging a self-bounding structure in the recursion. 3. With a generic stepsize schedule, the method attains $ \tilde{O}\left(\frac{\\|\theta^\star\\|^2}{\sqrt{T}}\right)$, requiring no prior spectral/curvature informati
The authors should provide intuition for their techniques—how they were derived and how they work in the proofs.
+ improved convergence rate results for TD(0) algorithms with linear function approximation under Markovian noise + Interesting finding and use of the self-bounding property of TD(0) updates + Numerical validations of the theory results
- Alg. 1 may not seem practical since the stepsize \eta_t requires prior knowledge of the horizon T, the feature bound \phi_inf, and a very large constant c>=281. - Missing formal statement of the theorem 4.2 in the main paper. I do not believe it is right to have only an informal result in the main paper that will be publishd only while having the formal result in the supplementary material. - The paper compares its theoretical rate to prior results, but the experiments lack a crucial baselin
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Speech and Audio Processing · Neural Networks and Applications
MethodsSparse Evolutionary Training
