New Versions of Gradient Temporal Difference Learning
Donghwan Lee, Han-Dong Lim, Jihoon Park, and Okyong Choi

TL;DR
This paper introduces variants of gradient temporal-difference learning algorithms, unifies them under a convex-concave saddle-point framework, and provides theoretical stability analysis along with numerical comparisons.
Contribution
It proposes new GTD variants based on saddle-point interpretations and establishes a unified framework with stability analysis.
Findings
Variants show improved stability and convergence properties
Unified saddle-point framework simplifies analysis of GTDs
Numerical results demonstrate competitive performance
Abstract
Sutton, Szepesv\'{a}ri and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this paper is (a) to propose some variants of GTDs with extensive comparative analysis and (b) to establish new theoretical analysis frameworks for the GTDs. These variants are based on convex-concave saddle-point interpretations of GTDs, which effectively unify all the GTDs into a single framework, and provide simple stability analysis based on recent results on primal-dual gradient dynamics. Finally, numerical comparative analysis is given to evaluate these approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Adaptive Dynamic Programming Control
