AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning

Shihao Cai; Runnan Fang; Jialong Wu; Baixuan Li; Xinyu Wang; Yong Jiang; Liangcai Su; Liwen Zhang; Wenbiao Yin; Zhen Zhang; Fuli Feng; Pengjun Xie; Xiaobin Wang

arXiv:2512.22857·cs.CL·December 30, 2025

AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning

Shihao Cai, Runnan Fang, Jialong Wu, Baixuan Li, Xinyu Wang, Yong Jiang, Liangcai Su, Liwen Zhang, Wenbiao Yin, Zhen Zhang, Fuli Feng, Pengjun Xie, Xiaobin Wang

PDF

Open Access

TL;DR

AutoForge introduces an automated, scalable pipeline for synthesizing challenging simulated environments and an environment-level RL algorithm that enhances training stability and efficiency for agentic reinforcement learning.

Contribution

It presents a unified pipeline for automated environment synthesis and an environment-level RL algorithm addressing user instability and improving training performance.

Findings

01

Effective environment synthesis for high-difficulty tasks

02

Improved training stability and efficiency in agentic RL

03

Strong out-of-domain generalization demonstrated

Abstract

Conducting reinforcement learning (RL) in simulated environments offers a cost-effective and highly scalable way to enhance language-based agents. However, previous work has been limited to semi-automated environment synthesis or tasks lacking sufficient difficulty, offering little breadth or depth. In addition, the instability of simulated users integrated into these environments, along with the heterogeneity across simulated environments, poses further challenges for agentic RL. In this work, we propose: (1) a unified pipeline for automated and scalable synthesis of simulated environments associated with high-difficulty but easily verifiable tasks; and (2) an environment level RL algorithm that not only effectively mitigates user instability but also performs advantage estimation at the environment level, thereby improving training efficiency and stability. Comprehensive evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Topic Modeling