Safe and Efficient Off-Policy Reinforcement Learning

R\'emi Munos; Tom Stepleton; Anna Harutyunyan; Marc G. Bellemare

arXiv:1606.02647·cs.LG·November 9, 2016·94 cites

Safe and Efficient Off-Policy Reinforcement Learning

R\'emi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare

PDF

Open Access 3 Repos

TL;DR

This paper introduces Retrace(λ), a novel off-policy reinforcement learning algorithm that is safe, low-variance, and sample-efficient, with proven convergence properties and demonstrated success on Atari games.

Contribution

It presents a new return-based off-policy RL algorithm, Retrace(λ), with strong theoretical guarantees and practical effectiveness, including convergence to optimal Q-values without the GLIE assumption.

Findings

01

Retrace(λ) is safe and low-variance for off-policy learning.

02

Proven convergence of Retrace(λ) and Watkins' Q(λ).

03

Successful application on Atari 2600 games.

Abstract

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace( $λ$ ), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of "off-policyness"; and (3) it is efficient as it makes the best use of samples collected from near on-policy behaviour policies. We analyze the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. We believe this is the first return-based off-policy control algorithm converging a.s. to $Q^{*}$ without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins' Q( $λ$ ), which was an open…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Formal Methods in Verification

MethodsRetrace