Online and Offline Reinforcement Learning by Planning with a Learned   Model

Julian Schrittwieser; Thomas Hubert; Amol Mandhane and; Mohammadamin Barekatain; Ioannis Antonoglou; David Silver

arXiv:2104.06294·cs.LG·April 14, 2021·26 cites

Online and Offline Reinforcement Learning by Planning with a Learned Model

Julian Schrittwieser, Thomas Hubert, Amol Mandhane and, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces MuZero Unplugged, a unified reinforcement learning algorithm that effectively handles both online and offline settings using a model-based planning approach, achieving state-of-the-art results without special adaptations.

Contribution

The paper presents Reanalyse, a novel algorithm for data-efficient learning, and combines it with MuZero to create MuZero Unplugged, a versatile method for all data regimes in reinforcement learning.

Findings

01

Sets new state-of-the-art in offline RL benchmark

02

Achieves top results in Atari online RL benchmark

03

Operates effectively without environment interaction or special offline adaptations

Abstract

Learning efficiently from small amounts of data has long been the focus of model-based reinforcement learning, both for the online case when interacting with the environment and the offline case when learning from a fixed dataset. However, to date no single unified algorithm could demonstrate state-of-the-art results in both settings. In this work, we describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points, allowing efficient learning for data budgets varying by several orders of magnitude. We further show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions, as in the case of offline Reinforcement Learning (offline RL). Combining Reanalyse with the MuZero algorithm, we introduce MuZero Unplugged, a single unified algorithm for any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Online and Offline Reinforcement Learning by Planning with a Learned Model· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Prioritized Experience Replay · Residual Connection · Convolution · Average Pooling · Monte-Carlo Tree Search · Residual Block · MuZero