Learning General World Models in a Handful of Reward-Free Deployments
Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh, Rybkin, Stephen J. Roberts, Tim Rockt\"aschel, Edward Grefenstette

TL;DR
This paper introduces CASCADE, a new self-supervised exploration method for reinforcement learning that efficiently learns general world models through diverse, task-agnostic data collection by multiple agents, enabling zero-shot generalization to new tasks.
Contribution
The paper proposes CASCADE, a novel population-based exploration approach that maximizes trajectory diversity for scalable, reward-free RL deployment and generalization.
Findings
CASCADE collects diverse, task-agnostic datasets effectively.
Agents trained with CASCADE generalize zero-shot to unseen tasks.
Theoretical analysis shows improved diversity over naive methods.
Abstract
Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research. We then present CASCADE, a novel approach for self-supervised exploration in this new setting. CASCADE seeks to learn a world model by collecting data with a population of agents, using an information theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this by specifically maximizing the diversity of trajectories sampled by the population through a novel cascading objective. We provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Anomaly Detection Techniques and Applications
