ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training

Dunwei Tu; Hongyan Hao; Hansi Yang; Yihao Chen; Yi-Kai Zhang; Zhikang Xia; Yu Yang; Yueqing Sun; Xingchen Liu; Furao Shen; Qi Gu; Hui Su; Xunliang Cai

arXiv:2602.06820·cs.AI·February 9, 2026

ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training

Dunwei Tu, Hongyan Hao, Hansi Yang, Yihao Chen, Yi-Kai Zhang, Zhikang Xia, Yu Yang, Yueqing Sun, Xingchen Liu, Furao Shen, Qi Gu, Hui Su, Xunliang Cai

PDF

Open Access

TL;DR

ScaleEnv is a novel framework for creating diverse, reliable, and scalable interactive environments from scratch, significantly improving generalist agent training and generalization in multi-turn tool-use tasks.

Contribution

We introduce ScaleEnv, a method to generate fully interactive, verifiable environments from scratch, addressing limitations of existing synthesis approaches and enhancing agent learning and generalization.

Findings

01

Agents trained in ScaleEnv outperform baselines on unseen benchmarks.

02

Scaling environmental diversity improves model generalization.

03

ScaleEnv enables reliable environment and task creation from scratch.

Abstract

Training generalist agents capable of adapting to diverse scenarios requires interactive environments for self-exploration. However, interactive environments remain critically scarce, and existing synthesis methods suffer from significant limitations regarding environmental diversity and scalability. To address these challenges, we introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch. Specifically, ScaleEnv ensures environment reliability through procedural testing, and guarantees task completeness and solvability via tool dependency graph expansion and executable action verification. By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks such as $τ^{2}$ -Bench and VitaBench, highlighting strong generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications