Finite-Sample Analysis of Proximal Gradient TD Algorithms
Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

TL;DR
This paper provides the first finite-sample convergence analysis of the GTD family of off-policy reinforcement learning algorithms, formulated as stochastic gradient methods, with improved algorithms proposed.
Contribution
It introduces a finite-sample analysis framework for GTD algorithms, formulates them as saddle-point problems, and proposes accelerated variants with better guarantees.
Findings
GTD algorithms are comparable to LSTD in off-policy scenarios
Finite-sample bounds are established for GTD methods
Proposed algorithms offer improved convergence and acceleration
Abstract
In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms. In this paper, we formulate GTD methods as stochastic gradient algorithms w.r.t.~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis show that the GTD family of algorithms are indeed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Iterative Learning Control Systems · Adaptive Dynamic Programming Control
