Gradient Iterated Temporal-Difference Learning
Th\'eo Vincent, Kevin Gerhardt, Yogesh Tripathi, Habib Maraqten, Adam White, Martha White, Jan Peters, Carlo D'Eramo

TL;DR
This paper introduces Gradient Iterated Temporal-Difference learning, a new gradient-based TD method that improves stability and learning speed, demonstrating competitive performance on benchmarks like Atari games.
Contribution
It develops a novel gradient TD algorithm based on iterated TD learning, addressing stability issues and achieving performance comparable to semi-gradient methods.
Findings
The proposed method is stable and competitive with semi-gradient TD methods.
It demonstrates strong performance on Atari benchmarks.
The algorithm effectively learns sequences of value functions in parallel.
Abstract
Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the gradient of the bootstrapped estimate. While popular, this type of update is prone to divergence, as Baird's counterexample illustrates. Gradient TD methods were introduced to overcome this issue, but have not been widely used, potentially due to issues with learning speed compared to semi-gradient methods. Recently, iterated TD learning was developed to increase the learning speed of TD methods. For that, it learns a sequence of action-value functions in parallel, where each function is optimized to represent the application of the Bellman operator over the previous function in the sequence. While promising, this algorithm can be unstable due to its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
