A Unified View of TD Algorithms; Introducing Full-Gradient TD and   Equi-Gradient Descent TD

Manuel Loth (INRIA Futurs); Philippe Preux (INRIA Futurs)

arXiv:cs/0611145·cs.LG·May 23, 2007

A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

Manuel Loth (INRIA Futurs), Philippe Preux (INRIA Futurs)

PDF

Open Access

TL;DR

This paper unifies various TD algorithms under a gradient minimization framework and introduces two new algorithms, Full-gradient TD and EGD TD, that improve sample efficiency while maintaining gradient descent properties.

Contribution

It provides a unified theoretical view of TD algorithms and introduces two novel algorithms that enhance sample efficiency and computational properties.

Findings

01

Full-gradient TD generalizes iLSTD principles.

02

EGD TD reduces gradients via successive equi-gradient descents.

03

The new algorithms outperform traditional TD in sample utilization.

Abstract

This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Markov Chains and Monte Carlo Methods