A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
Manuel Loth (INRIA Futurs), Philippe Preux (INRIA Futurs)

TL;DR
This paper unifies various TD algorithms under a gradient minimization framework and introduces two new algorithms, Full-gradient TD and EGD TD, that improve sample efficiency while maintaining gradient descent properties.
Contribution
It provides a unified theoretical view of TD algorithms and introduces two novel algorithms that enhance sample efficiency and computational properties.
Findings
Full-gradient TD generalizes iLSTD principles.
EGD TD reduces gradients via successive equi-gradient descents.
The new algorithms outperform traditional TD in sample utilization.
Abstract
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Markov Chains and Monte Carlo Methods
