GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Jiacheng Guo; Ling Yang; Peter Chen; Qixin Xiao; Yinjie Wang; Xinzhe Juan; Jiahao Qiu; Ke Shen; Mengdi Wang

arXiv:2512.19682·cs.CL·December 24, 2025

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Jiacheng Guo, Ling Yang, Peter Chen, Qixin Xiao, Yinjie Wang, Xinzhe Juan, Jiahao Qiu, Ke Shen, Mengdi Wang

PDF

Open Access

TL;DR

GenEnv introduces a dynamic co-evolution framework where environment simulators generate tailored tasks for LLM agents, improving performance efficiently by aligning task difficulty with the agent's capabilities.

Contribution

The paper presents GenEnv, a novel co-evolutionary framework that dynamically generates tasks to adaptively train LLM agents, reducing data requirements and enhancing scalability.

Findings

01

Up to +40.3% performance improvement over 7B baselines

02

Matches or exceeds larger model performance

03

Uses 3.3× less data than offline augmentation methods

Abstract

Training capable Large Language Model (LLM) agents is critically bottlenecked by the high cost and static nature of real-world interaction data. We address this by introducing GenEnv, a framework that establishes a difficulty-aligned co-evolutionary game between an agent and a scalable, generative environment simulator. Unlike traditional methods that evolve models on static datasets, GenEnv instantiates a dataevolving: the simulator acts as a dynamic curriculum policy, continuously generating tasks specifically tailored to the agent's ``zone of proximal development''. This process is guided by a simple but effective $α$ -Curriculum Reward, which aligns task difficulty with the agent's current capabilities. We evaluate GenEnv on five benchmarks, including API-Bank, ALFWorld, BFCL, Bamboogle, and TravelPlanner. Across these tasks, GenEnv improves agent performance by up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education