AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

Jiayi Zhang; Yiran Peng; Fanqi Kong; Cheng Yang; Yifan Wu; Zhaoyang Yu; Jinyu Xiang; Jianhao Ruan; Jinlin Wang; Maojia Song; HongZhang Liu; Xiangru Tang; Bang Liu; Chenglin Wu; Yuyu Luo

arXiv:2511.19304·cs.AI·December 4, 2025

AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

Jiayi Zhang, Yiran Peng, Fanqi Kong, Cheng Yang, Yifan Wu, Zhaoyang Yu, Jinyu Xiang, Jianhao Ruan, Jinlin Wang, Maojia Song, HongZhang Liu, Xiangru Tang, Bang Liu, Chenglin Wu, Yuyu Luo

PDF

Open Access

TL;DR

AutoEnv introduces an automated framework and dataset for evaluating how agents learn across diverse, heterogeneous environments, revealing challenges and limitations in scalable cross-environment generalization.

Contribution

The paper presents AutoEnv, a novel automated environment generation framework, and AutoEnv-36, a dataset for testing agent learning across multiple heterogeneous worlds.

Findings

01

Fixed learning methods do not scale well across many environments.

02

Environment-adaptive method selection improves performance.

03

Performance gains diminish as the number of environments increases.

Abstract

Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically demonstrate improvements via self-evolving within a single domain, implicitly assuming a fixed environment distribution. Cross-environment learning has remained largely unmeasured: there is no standard collection of controllable, heterogeneous environments, nor a unified way to represent how agents learn. We address these gaps in two steps. First, we propose AutoEnv, an automated framework that treats environments as factorizable distributions over transitions, observations, and rewards, enabling low-cost (4.12 USD on average) generation of heterogeneous worlds. Using AutoEnv, we construct AutoEnv-36, a dataset of 36 environments with 358 validated levels, on which seven language models achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics