Improving Regret Approximation for Unsupervised Dynamic Environment Generation

Harry Mead; Bruno Lacerda; Jakob Foerster; Nick Hawes

arXiv:2601.14957·cs.LG·January 22, 2026

Improving Regret Approximation for Unsupervised Dynamic Environment Generation

Harry Mead, Bruno Lacerda, Jakob Foerster, Nick Hawes

PDF

Open Access 1 Video

TL;DR

This paper introduces DEGen and MNA to improve environment generation in reinforcement learning, enabling better curriculum design and scaling to larger environments by addressing credit assignment issues.

Contribution

The paper proposes DEGen and MNA, novel methods that enhance environment generation for UED, improving challenge detection and scalability in RL training.

Findings

01

MNA outperforms existing regret approximations.

02

DEGen enables scalable environment generation.

03

Combined approach improves RL generalisation.

Abstract

Unsupervised Environment Design (UED) seeks to automatically generate training curricula for reinforcement learning (RL) agents, with the goal of improving generalisation and zero-shot performance. However, designing effective curricula remains a difficult problem, particularly in settings where small subsets of environment parameterisations result in significant increases in the complexity of the required policy. Current methods struggle with a difficult credit assignment problem and rely on regret approximations that fail to identify challenging levels, both of which are compounded as the size of the environment grows. We propose Dynamic Environment Generation for UED (DEGen) to enable a denser level generator reward signal, reducing the difficulty of credit assignment and allowing for UED to scale to larger environment sizes. We also introduce a new regret approximation, Maximised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improving Regret Approximation for Unsupervised Dynamic Environment Generation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Artificial Intelligence in Games