PACE: Parameter Change for Unsupervised Environment Design

Fang Yuan; Quanjun Yin; Siqi Shen; Yuxiang Xie; Junqiang Yang; Long Qin; Junjie Zeng; Qinglun Li

arXiv:2605.01358·cs.LG·May 5, 2026

PACE: Parameter Change for Unsupervised Environment Design

Fang Yuan, Quanjun Yin, Siqi Shen, Yuxiang Xie, Junqiang Yang, Long Qin, Junjie Zeng, Qinglun Li

PDF

TL;DR

PACE introduces a new environment evaluation method based on policy parameter change, improving reinforcement learning generalization by efficiently measuring actual learning progress without high variance or computational costs.

Contribution

It proposes a novel, low-variance, computation-efficient environment evaluation method grounded in realized learning progress, outperforming existing UED approaches.

Findings

01

PACE outperforms baseline UED methods on MiniGrid and Craftax.

02

Achieves an IQM of 96.4% on MiniGrid.

03

Reduces the Optimality Gap to 17.2% on MiniGrid.

Abstract

Unsupervised Environment Design (UED) offers a promising paradigm for improving reinforcement learning generalization by adaptively shaping training environments, but it requires reliable environment evaluation to remain effective. However, existing UED methods evaluate environments using indirect proxy signals such as regret, value-based errors, or Monte Carlo, which suffer from bias, high variance, or substantial computational overhead and fail to reflect agent realized learning progress. To address these limitations, we propose Parameter Change Environment Design (PACE), which evaluates an environment through the policy parameter change induced by training on that environment, directly grounding environment selection in realized learning progress. Specifically, PACE assigns environment value using a first-order approximation of the policy optimization objective, where the improvement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.