EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Xiaoshuai Song; Haofei Chang; Guanting Dong; Yutao Zhu; Ji-Rong Wen; Zhicheng Dou

arXiv:2601.05808·cs.CL·April 20, 2026

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Xiaoshuai Song, Haofei Chang, Guanting Dong, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou

PDF

1 Repo 3 Models 5 Datasets

TL;DR

EnvScaler is an automated framework that synthesizes diverse, scalable tool-interaction environments for training and evaluating LLM agents, significantly enhancing their performance in complex multi-tool tasks.

Contribution

It introduces a novel, automated approach to generate diverse environments and scenarios for LLM training, addressing scalability and realism issues in tool-interaction testing.

Findings

01

Synthesized 191 environments and 7K scenarios for LLM training.

02

Significant performance improvements on three benchmarks involving multi-tool interactions.

03

Code and data are publicly available at the provided GitHub link.

Abstract

Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, access to real systems is often restricted; LLM-simulated environments are prone to hallucinations and inconsistencies; and manually built sandboxes are hard to scale. In this paper, we propose EnvScaler, an automated framework for scalable tool-interaction environments via programmatic synthesis. EnvScaler comprises two components. First, SkelBuilder constructs diverse environment skeletons through topic mining, logic modeling, and quality evaluation. Then, ScenGenerator generates multiple task scenarios and rule-based trajectory validation functions for each environment. With EnvScaler, we synthesize 191 environments and about 7K scenarios, and apply them to Supervised Fine-Tuning (SFT) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RUC-NLPIR/EnvScaler
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.