Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Zhaoyang Wang; Canwen Xu; Boyi Liu; Yite Wang; Siwei Han; Zhewei Yao; Huaxiu Yao; Yuxiong He

arXiv:2602.10090·cs.AI·February 12, 2026

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Zhaoyang Wang, Canwen Xu, Boyi Liu, Yite Wang, Siwei Han, Zhewei Yao, Huaxiu Yao, Yuxiong He

PDF

Open Access 3 Models 1 Datasets

TL;DR

This paper introduces Agent World Model (AWM), a synthetic environment generation pipeline that creates diverse, reliable, and code-driven environments for training reinforcement learning agents, enabling better generalization and more efficient interactions.

Contribution

The paper presents AWM, a scalable pipeline for generating synthetic environments with rich toolsets, improving over LLM-based environments in reliability and efficiency for agent training.

Findings

01

Training in synthetic environments improves out-of-distribution generalization.

02

AWM enables large-scale reinforcement learning with reliable reward functions.

03

Synthetic environments outperform benchmark-specific environments in certain tasks.

Abstract

Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent World Model (AWM), a fully synthetic environment generation pipeline. Using this pipeline, we scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets (35 tools per environment on average) and obtain high-quality observations. Notably, these environments are code-driven and backed by databases, providing more reliable and consistent state transitions than environments simulated by LLMs. Moreover, they enable more efficient agent interaction compared with collecting trajectories from realistic environments. To demonstrate the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Snowflake/AgentWorldModel-1K
dataset· 494 dl
494 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications