New Versions of Gradient Temporal Difference Learning

Donghwan Lee; Han-Dong Lim; Jihoon Park; and Okyong Choi

arXiv:2109.04033·cs.LG·January 23, 2024

New Versions of Gradient Temporal Difference Learning

Donghwan Lee, Han-Dong Lim, Jihoon Park, and Okyong Choi

PDF

Open Access

TL;DR

This paper introduces variants of gradient temporal-difference learning algorithms, unifies them under a convex-concave saddle-point framework, and provides theoretical stability analysis along with numerical comparisons.

Contribution

It proposes new GTD variants based on saddle-point interpretations and establishes a unified framework with stability analysis.

Findings

01

Variants show improved stability and convergence properties

02

Unified saddle-point framework simplifies analysis of GTDs

03

Numerical results demonstrate competitive performance

Abstract

Sutton, Szepesv\'{a}ri and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this paper is (a) to propose some variants of GTDs with extensive comparative analysis and (b) to establish new theoretical analysis frameworks for the GTDs. These variants are based on convex-concave saddle-point interpretations of GTDs, which effectively unify all the GTDs into a single framework, and provide simple stability analysis based on recent results on primal-dual gradient dynamics. Finally, numerical comparative analysis is given to evaluate these approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Adaptive Dynamic Programming Control