The Effectiveness of World Models for Continual Reinforcement Learning

Samuel Kessler; Mateusz Ostaszewski; Micha{\l} Bortkiewicz; Mateusz; \.Zarski; Maciej Wo{\l}czyk; Jack Parker-Holder; Stephen J. Roberts; Piotr; Mi{\l}o\'s

arXiv:2211.15944·cs.LG·July 14, 2023·1 cites

The Effectiveness of World Models for Continual Reinforcement Learning

Samuel Kessler, Mateusz Ostaszewski, Micha{\l} Bortkiewicz, Mateusz, \.Zarski, Maciej Wo{\l}czyk, Jack Parker-Holder, Stephen J. Roberts, Piotr, Mi{\l}o\'s

PDF

Open Access 2 Repos

TL;DR

This paper demonstrates that world models can be effectively adapted for continual reinforcement learning, improving performance and transfer in changing environments through selective experience replay and new modeling strategies.

Contribution

It introduces Continual-Dreamer, a task-agnostic world model approach for continual RL that outperforms existing methods on benchmark tasks.

Findings

01

Continual-Dreamer is sample efficient.

02

It outperforms state-of-the-art continual RL methods.

03

Selective experience replay enhances performance and transfer.

Abstract

World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Advanced Bandit Algorithms Research

MethodsExperience Replay