Finite-Sample Analysis of Proximal Gradient TD Algorithms

Bo Liu; Ji Liu; Mohammad Ghavamzadeh; Sridhar Mahadevan; Marek Petrik

arXiv:2006.14364·cs.LG·July 6, 2020·105 cites

Finite-Sample Analysis of Proximal Gradient TD Algorithms

Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

PDF

Open Access

TL;DR

This paper provides the first finite-sample convergence analysis of the GTD family of off-policy reinforcement learning algorithms, formulated as stochastic gradient methods, with improved algorithms proposed.

Contribution

It introduces a finite-sample analysis framework for GTD algorithms, formulates them as saddle-point problems, and proposes accelerated variants with better guarantees.

Findings

01

GTD algorithms are comparable to LSTD in off-policy scenarios

02

Finite-sample bounds are established for GTD methods

03

Proposed algorithms offer improved convergence and acceleration

Abstract

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms. In this paper, we formulate GTD methods as stochastic gradient algorithms w.r.t.~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis show that the GTD family of algorithms are indeed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Iterative Learning Control Systems · Adaptive Dynamic Programming Control