Replay-Guided Adversarial Environment Design
Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster,, Edward Grefenstette, Tim Rockt\"aschel

TL;DR
This paper introduces a new perspective on environment design in reinforcement learning, unifying existing methods under a theoretical framework, and proposes a novel algorithm, PLR⊥, that improves out-of-distribution transfer performance.
Contribution
It formalizes Prioritized Level Replay as a form of Unsupervised Environment Design, develops a dual curriculum framework, and introduces PLR⊥, which enhances convergence and transfer in RL.
Findings
PLR⊥ outperforms existing methods on transfer tasks.
Theoretical guarantees are established for a class of UED methods.
Stopping policy updates on uncurated levels improves convergence.
Abstract
Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
