Procedural Environment Generation for Tool-Use Agents

Michael Sullivan; Mareike Hartmann; Alexander Koller

arXiv:2506.11045·cs.LG·September 25, 2025

Procedural Environment Generation for Tool-Use Agents

Michael Sullivan, Mareike Hartmann, Alexander Koller

PDF

1 Video

TL;DR

This paper presents RandomWorld, a procedural pipeline for generating interactive and compositional synthetic tool-use data, which improves model performance on tool-use benchmarks and sets new state-of-the-art results.

Contribution

Introduction of RandomWorld, a novel procedural pipeline for synthetic interactive tool-use data generation, enhancing training and benchmark performance.

Findings

01

Models trained on RandomWorld data outperform previous methods.

02

RandomWorld data scales with training set size, improving downstream performance.

03

Sets new state-of-the-art on the NESTFUL dataset.

Abstract

Although the power of LLM tool-use agents has ignited a flurry of recent research in this area, the curation of tool-use training data remains an open problem $-$ especially for online RL training. Existing approaches to synthetic tool-use data generation tend to be non-interactive, and/or non-compositional. We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data. We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks, and set the new SoTA for two metrics on the NESTFUL dataset. Further experiments show that downstream performance scales with the amount of RandomWorld-generated training data, opening up the possibility of further improvement through the use of entirely synthetic data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Procedural Environment Generation for Tool-Use Agents· underline