Simulating Environments with Reasoning Models for Agent Training
Yuetai Li, Huseyin A Inan, Xiang Yue, Wei-Ning Chen, Lukas Wutschitz, Janardhan Kulkarni, Radha Poovendran, Robert Sim, Saravan Rajmohan

TL;DR
This paper introduces LLM-based simulation frameworks, Simia-SFT and Simia-RL, that enable scalable agent training without real environment data or APIs, improving robustness and reducing engineering effort.
Contribution
The paper presents novel frameworks for environment simulation using LLMs, allowing training without actual testbeds or environment implementations.
Findings
Fine-tuned models outperform baseline benchmarks.
Simia-RL achieves near state-of-the-art results on $ au^2$-Bench.
Scalable agent training without environment engineering.
Abstract
LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas. Building bespoke environments for training is heavy, brittle, and limits progress. In this paper, we demonstrate that LLMs can simulate realistic environment feedback without access to actual testbed data or APIs. Inspired by this capability, we propose two frameworks: Simia-SFT, a pipeline that synthesizes SFT data by amplifying small seed sets into diverse trajectories in an environment-agnostic manner, and Simia-RL, a framework that enables RL training without real environment implementations through LLM-simulated feedback. Fine-tuning open models yields consistent improvements across multiple benchmarks, surpassing GPT-4o and approaching o4-mini on -Bench. Together, Simia-SFT and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Intelligent Tutoring Systems and Adaptive Learning · AI-based Problem Solving and Planning
