A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections or Strong Convexity

Wei-Cheng Lee; Francesco Orabona

arXiv:2506.01052·cs.LG·September 26, 2025

A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections or Strong Convexity

Wei-Cheng Lee, Francesco Orabona

PDF

Open Access 3 Reviews

TL;DR

This paper provides a new finite-time convergence analysis of projection-free TD learning with linear function approximation, demonstrating it converges without artificial boundedness assumptions even under Markovian noise.

Contribution

It introduces a refined analysis showing projection-free TD learning converges with a specific rate without relying on strong convexity or artificial projections.

Findings

01

Convergence rate of rac{{ heta^*}||^2_2}{\u007sqrt{T}}

02

Convergence holds even with Markovian noise

03

Establishes a self-bounding property of TD updates

Abstract

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in the field of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the minimal curvature of the potential function. While prior work has established convergence guarantees in this setting, these results typically rely on the assumption that each iterate is projected onto a bounded set, a condition that is both artificial and does not match the current practice. In this paper, we challenge the necessity of such an assumption and present a refined analysis of TD learning. For the first time, we show that the simple projection-free variant converges with a rate of $O (\frac{∣∣ θ ^{*} ∣ ∣ _{2}^{2}}{T})$ , even in the presence of Markovian…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 0Confidence 4

Strengths

- First robust, **projection-free** guarantee for linear TD(0). - Handles Markovian bias via mixing; clear comparison to prior work.

Weaknesses

- **The paper clearly violates the ICLR format. Missing page numbers and the whole text has been shifted down and therefore should be desk rejected.** - Experiments are narrow; few task details in main text.

Reviewer 02Rating 10Confidence 5

Strengths

1. The authors establish non-asymptotic bounds for TD(0) with linear function approximation under Markov noise, matching practical projection-free implementations. 2. The analysis shows the iterates remain bounded in expectation without projections or strong convexity, by leveraging a self-bounding structure in the recursion. 3. With a generic stepsize schedule, the method attains $ \tilde{O}\left(\frac{\\|\theta^\star\\|^2}{\sqrt{T}}\right)$, requiring no prior spectral/curvature informati

Weaknesses

The authors should provide intuition for their techniques—how they were derived and how they work in the proofs.

Reviewer 03Rating 2Confidence 4

Strengths

+ improved convergence rate results for TD(0) algorithms with linear function approximation under Markovian noise + Interesting finding and use of the self-bounding property of TD(0) updates + Numerical validations of the theory results

Weaknesses

- Alg. 1 may not seem practical since the stepsize \eta_t requires prior knowledge of the horizon T, the feature bound \phi_inf, and a very large constant c>=281. - Missing formal statement of the theorem 4.2 in the main paper. I do not believe it is right to have only an informal result in the main paper that will be publishd only while having the formal result in the supplementary material. - The paper compares its theoretical rate to prior results, but the experiments lack a crucial baselin

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Speech and Audio Processing · Neural Networks and Applications

MethodsSparse Evolutionary Training