Gradient Descent Temporal Difference-difference Learning

Rong J.B. Zhu; James M. Murray

arXiv:2209.04624·cs.LG·September 13, 2022

Gradient Descent Temporal Difference-difference Learning

Rong J.B. Zhu, James M. Murray

PDF

Open Access

TL;DR

This paper introduces Gradient-DD, an improved off-policy reinforcement learning algorithm that incorporates second-order differences, demonstrating faster convergence and better performance than existing methods like GTD2 and traditional TD learning.

Contribution

The paper proposes Gradient-DD, a novel algorithm that enhances GTD2 with second-order differences, and provides theoretical convergence proof and empirical validation.

Findings

01

Gradient-DD converges faster than GTD2.

02

Gradient-DD outperforms GTD2 in various tasks.

03

In some cases, Gradient-DD surpasses conventional TD learning.

Abstract

Off-policy algorithms, in which a behavior policy differs from the target policy and is used to gain experience for learning, have proven to be of great practical value in reinforcement learning. However, even for simple convex problems such as linear value function approximation, these algorithms are not guaranteed to be stable. To address this, alternative algorithms that are provably convergent in such cases have been introduced, the most well known being gradient descent temporal difference (GTD) learning. This algorithm and others like it, however, tend to converge much more slowly than conventional temporal difference learning. In this paper we propose gradient descent temporal difference-difference (Gradient-DD) learning in order to improve GTD2, a GTD algorithm, by introducing second-order differences in successive parameter updates. We investigate this algorithm in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research