Stabilizing Unsupervised Environment Design with a Learned Adversary

Ishita Mediratta; Minqi Jiang; Jack Parker-Holder; Michael Dennis,; Eugene Vinitsky; Tim Rockt\"aschel

arXiv:2308.10797·cs.LG·August 23, 2023·1 cites

Stabilizing Unsupervised Environment Design with a Learned Adversary

Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis,, Eugene Vinitsky, Tim Rockt\"aschel

PDF

Open Access 1 Repo

TL;DR

This paper improves the PAIRED approach for Unsupervised Environment Design by addressing its shortcomings, enabling it to generate challenging environments that lead to more robust and general agents in complex tasks.

Contribution

It identifies key limitations of PAIRED and proposes solutions, allowing direct environment generation that surpasses existing methods in robustness and generalization.

Findings

01

PAIRED can now generate environments that improve agent robustness.

02

Enhanced PAIRED matches or exceeds state-of-the-art in challenging tasks.

03

Results demonstrate improved generalization in maze navigation and car racing environments.

Abstract

A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent's current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/dcd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing