Emergent Complexity and Zero-shot Transfer via Unsupervised Environment   Design

Michael Dennis; Natasha Jaques; Eugene Vinitsky; Alexandre Bayen,; Stuart Russell; Andrew Critch; Sergey Levine

arXiv:2012.02096·cs.LG·February 5, 2021·22 cites

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen,, Stuart Russell, Andrew Critch, Sergey Levine

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces PAIRED, a novel unsupervised environment design method using a protagonist-antagonist setup to generate structured, solvable environments that facilitate curriculum learning and improve zero-shot transfer in reinforcement learning.

Contribution

The paper proposes PAIRED, an innovative approach that automatically generates environment distributions with structured difficulty, overcoming limitations of domain randomization and adversarial training.

Findings

01

PAIRED produces a natural curriculum of increasing complexity.

02

Agents trained with PAIRED achieve higher zero-shot transfer performance.

03

The method effectively generates solvable, structured environments for reinforcement learning.

Abstract

A wide range of reinforcement learning (RL) problems - including robustness, transfer learning, unsupervised RL, and emergent complexity - require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Software Engineering Research · Reinforcement Learning in Robotics

MethodsProtagonist Antagonist Induced Regret Environment Design