Proximal Gradient Temporal Difference Learning: Stable Reinforcement   Learning with Polynomial Sample Complexity

Bo Liu; Ian Gemp; Mohammad Ghavamzadeh; Ji Liu; Sridhar Mahadevan,; Marek Petrik

arXiv:2006.03976·cs.LG·June 9, 2020·5 cites

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan,, Marek Petrik

PDF

Open Access 1 Repo

TL;DR

This paper introduces proximal gradient TD learning, providing a new theoretical framework with finite-sample guarantees and an accelerated algorithm, demonstrating improved stability and efficiency in reinforcement learning.

Contribution

It derives gradient TD methods from a primal-dual saddle-point perspective and proposes an accelerated algorithm with better convergence rates.

Findings

01

Finite-sample bounds for gradient TD algorithms

02

Proposed GTD2-MP accelerates convergence

03

Algorithms are suitable for off-policy learning with linear complexity

Abstract

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal ``mirror maps'' to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sinaghiassian/OffpolicyAlgorithms
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Adaptive Dynamic Programming Control · Advanced Multi-Objective Optimization Algorithms