Stabilizing Unsupervised Environment Design with a Learned Adversary
Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis,, Eugene Vinitsky, Tim Rockt\"aschel

TL;DR
This paper improves the PAIRED approach for Unsupervised Environment Design by addressing its shortcomings, enabling it to generate challenging environments that lead to more robust and general agents in complex tasks.
Contribution
It identifies key limitations of PAIRED and proposes solutions, allowing direct environment generation that surpasses existing methods in robustness and generalization.
Findings
PAIRED can now generate environments that improve agent robustness.
Enhanced PAIRED matches or exceeds state-of-the-art in challenging tasks.
Results demonstrate improved generalization in maze navigation and car racing environments.
Abstract
A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent's current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing
