Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Zixuan Xie, Xinyu Liu, Rohan Chandra, Shangtong Zhang

TL;DR
This paper provides the first $L^2$ convergence analysis of linear TD($$) with arbitrary features, extending theoretical guarantees to more realistic scenarios without requiring feature independence.
Contribution
It establishes $L^2$ convergence rates for linear TD($$) with arbitrary features, without modifications or extra assumptions, in both discounted and average-reward settings.
Findings
First $L^2$ convergence rates for arbitrary features
Applicable to both discounted and average-reward cases
Addresses non-uniqueness of solutions with a new stochastic approximation result
Abstract
Linear TD() is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold in many practical scenarios. This paper instead establishes the first convergence rates for linear TD() operating under arbitrary features, without making any algorithmic modification or additional assumptions. Our results apply to both the discounted and average-reward settings. To address the potential non-uniqueness of solutions resulting from arbitrary features, we develop a novel stochastic approximation result featuring convergence rates to the solution set instead of a single point.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCancer-related molecular mechanisms research
