Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning

Tadashi Kozuno; Yunhao Tang; Mark Rowland; R\'emi Munos; Steven; Kapturowski; Will Dabney; Michal Valko; David Abel

arXiv:2103.00107·cs.LG·March 2, 2021·1 cites

Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning

Tadashi Kozuno, Yunhao Tang, Mark Rowland, R\'emi Munos, Steven, Kapturowski, Will Dabney, Michal Valko, David Abel

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper provides the first theoretical convergence proof for Peng's Q(λ), a non-conservative off-policy reinforcement learning algorithm, and demonstrates its practical effectiveness in complex continuous control tasks.

Contribution

It proves Peng's Q(λ) converges to an optimal policy under certain conditions and validates its empirical performance in complex tasks.

Findings

01

Peng's Q(λ) converges to an optimal policy when the behavior policy tracks a greedy policy.

02

Peng's Q(λ) often outperforms conservative algorithms in continuous control tasks.

03

Theoretical analysis confirms Peng's Q(λ) is both sound and effective.

Abstract

Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonetheless, recent studies have shown that non-conservative algorithms empirically outperform conservative ones. Motivated by the empirical results and the lack of theory, we carry out theoretical analyses of Peng's Q( $λ$ ), a representative example of non-conservative algorithms. We prove that it also converges to an optimal policy provided that the behavior policy slowly tracks a greedy policy in a way similar to conservative policy iteration. Such a result has been conjectured to be true but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

misovalko/my-research-papers
dataset· 21 dl
21 dl

Videos

Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Optimization and Search Problems