Learning General World Models in a Handful of Reward-Free Deployments

Yingchen Xu; Jack Parker-Holder; Aldo Pacchiano; Philip J. Ball; Oleh; Rybkin; Stephen J. Roberts; Tim Rockt\"aschel; Edward Grefenstette

arXiv:2210.12719·cs.LG·October 25, 2022·1 cites

Learning General World Models in a Handful of Reward-Free Deployments

Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh, Rybkin, Stephen J. Roberts, Tim Rockt\"aschel, Edward Grefenstette

PDF

Open Access 1 Video

TL;DR

This paper introduces CASCADE, a new self-supervised exploration method for reinforcement learning that efficiently learns general world models through diverse, task-agnostic data collection by multiple agents, enabling zero-shot generalization to new tasks.

Contribution

The paper proposes CASCADE, a novel population-based exploration approach that maximizes trajectory diversity for scalable, reward-free RL deployment and generalization.

Findings

01

CASCADE collects diverse, task-agnostic datasets effectively.

02

Agents trained with CASCADE generalize zero-shot to unseen tasks.

03

Theoretical analysis shows improved diversity over naive methods.

Abstract

Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research. We then present CASCADE, a novel approach for self-supervised exploration in this new setting. CASCADE seeks to learn a world model by collecting data with a population of agents, using an information theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this by specifically maximizing the diversity of trajectories sampled by the population through a novel cascading objective. We provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning General World Models in a Handful of Reward-Free Deployments· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Anomaly Detection Techniques and Applications