Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features
Jiuqi Wang, Shangtong Zhang

TL;DR
This paper proves that linear temporal difference learning converges almost surely even when the features are linearly dependent, broadening the understanding of its theoretical guarantees in practical scenarios.
Contribution
It establishes the first almost sure convergence proof for linear TD without requiring feature independence, using a novel analysis of bounded invariant sets.
Findings
Linear TD converges to a bounded set without feature independence.
Value estimates are consistent almost everywhere.
Introduces a new characterization of invariant sets for the mean ODE.
Abstract
Temporal difference (TD) learning with linear function approximation (linear TD) is a classic and powerful prediction algorithm in reinforcement learning. While it is well-understood that linear TD converges almost surely to a unique point, this convergence traditionally requires the assumption that the features used by the approximator are linearly independent. However, this linear independence assumption does not hold in many practical scenarios. This work is the first to establish the almost sure convergence of linear TD without requiring linearly independent features. We prove that the weight iterates of linear TD converge to a bounded set, and that the value estimates derived from the weights in that set are the same almost everywhere. We also establish a notion of local stability of the weight iterates. Importantly, we do not impose assumptions tailored to feature dependence and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
