No Regrets: Investigating and Improving Regret Approximations for   Curriculum Discovery

Alexander Rutherford; Michael Beukman; Timon Willi; Bruno Lacerda,; Nick Hawes; Jakob Foerster

arXiv:2408.15099·cs.LG·October 31, 2024

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Alexander Rutherford, Michael Beukman, Timon Willi, Bruno Lacerda,, Nick Hawes, Jakob Foerster

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper critically examines existing Unsupervised Environment Design methods in reinforcement learning, revealing they focus on success rate rather than regret, and proposes a new approach that emphasizes training on scenarios with high learnability to improve agent robustness.

Contribution

It identifies the mismatch between theoretical regret maximization and practical environment selection, and introduces a learnability-focused training method that outperforms existing UED techniques.

Findings

01

Existing UED methods correlate with success rate, not regret.

02

Training on learnable scenarios improves robustness in multiple environments.

03

Proposed method outperforms current UED approaches in Minigrid and robotics-inspired tasks.

Abstract

What data or environments to use for training to improve downstream performance is a longstanding and very topical question in reinforcement learning. In particular, Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. Surprisingly, despite methods aiming to maximise regret in theory, the practical approximations do not correlate with regret but with success rate. As a result, a significant portion of an agent's experience comes from environments it has already mastered, offering little to no contribution toward enhancing its abilities. Put differently, current methods fail to predict intuitive measures of ``learnability.'' Specifically, they are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amacrutherford/sampling-for-learnability
jaxOfficial

Videos

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery· slideslive

Taxonomy

TopicsEducational Assessment and Pedagogy · Educational Assessment and Improvement · Statistics Education and Methodologies

MethodsSoftmax · Attention Is All You Need