Replay-Guided Adversarial Environment Design

Minqi Jiang; Michael Dennis; Jack Parker-Holder; Jakob Foerster,; Edward Grefenstette; Tim Rockt\"aschel

arXiv:2110.02439·cs.LG·January 17, 2022·1 cites

Replay-Guided Adversarial Environment Design

Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster,, Edward Grefenstette, Tim Rockt\"aschel

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces a new perspective on environment design in reinforcement learning, unifying existing methods under a theoretical framework, and proposes a novel algorithm, PLR⊥, that improves out-of-distribution transfer performance.

Contribution

It formalizes Prioritized Level Replay as a form of Unsupervised Environment Design, develops a dual curriculum framework, and introduces PLR⊥, which enhances convergence and transfer in RL.

Findings

01

PLR⊥ outperforms existing methods on transfer tasks.

02

Theoretical guarantees are established for a class of UED methods.

03

Stopping policy updates on uncurated levels improves convergence.

Abstract

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Replay-Guided Adversarial Environment Design· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification