PACE: Parameter Change for Unsupervised Environment Design
Fang Yuan, Quanjun Yin, Siqi Shen, Yuxiang Xie, Junqiang Yang, Long Qin, Junjie Zeng, Qinglun Li

TL;DR
PACE introduces a new environment evaluation method based on policy parameter change, improving reinforcement learning generalization by efficiently measuring actual learning progress without high variance or computational costs.
Contribution
It proposes a novel, low-variance, computation-efficient environment evaluation method grounded in realized learning progress, outperforming existing UED approaches.
Findings
PACE outperforms baseline UED methods on MiniGrid and Craftax.
Achieves an IQM of 96.4% on MiniGrid.
Reduces the Optimality Gap to 17.2% on MiniGrid.
Abstract
Unsupervised Environment Design (UED) offers a promising paradigm for improving reinforcement learning generalization by adaptively shaping training environments, but it requires reliable environment evaluation to remain effective. However, existing UED methods evaluate environments using indirect proxy signals such as regret, value-based errors, or Monte Carlo, which suffer from bias, high variance, or substantial computational overhead and fail to reflect agent realized learning progress. To address these limitations, we propose Parameter Change Environment Design (PACE), which evaluates an environment through the policy parameter change induced by training on that environment, directly grounding environment selection in realized learning progress. Specifically, PACE assigns environment value using a first-order approximation of the policy optimization objective, where the improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
