Online and Offline Reinforcement Learning by Planning with a Learned Model
Julian Schrittwieser, Thomas Hubert, Amol Mandhane and, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver

TL;DR
This paper introduces MuZero Unplugged, a unified reinforcement learning algorithm that effectively handles both online and offline settings using a model-based planning approach, achieving state-of-the-art results without special adaptations.
Contribution
The paper presents Reanalyse, a novel algorithm for data-efficient learning, and combines it with MuZero to create MuZero Unplugged, a versatile method for all data regimes in reinforcement learning.
Findings
Sets new state-of-the-art in offline RL benchmark
Achieves top results in Atari online RL benchmark
Operates effectively without environment interaction or special offline adaptations
Abstract
Learning efficiently from small amounts of data has long been the focus of model-based reinforcement learning, both for the online case when interacting with the environment and the offline case when learning from a fixed dataset. However, to date no single unified algorithm could demonstrate state-of-the-art results in both settings. In this work, we describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points, allowing efficient learning for data budgets varying by several orders of magnitude. We further show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions, as in the case of offline Reinforcement Learning (offline RL). Combining Reanalyse with the MuZero algorithm, we introduce MuZero Unplugged, a single unified algorithm for any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Prioritized Experience Replay · Residual Connection · Convolution · Average Pooling · Monte-Carlo Tree Search · Residual Block · MuZero
