APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents
Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi

TL;DR
APEX introduces a strategy map-based approach to enhance exploration in self-evolving LLM agents, enabling sustained discovery and improved performance in complex tasks.
Contribution
It proposes a novel explicit strategy space with Fork Discovery and Policy Selection to prevent exploration collapse in self-evolving agents.
Findings
APEX outperforms all baselines on Jericho and WebArena benchmarks.
Extensive ablations confirm each component's importance.
Demonstrates robustness across diverse environments.
Abstract
LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requiring model-weight updates. However, these agents often suffer from exploration collapse: as memory grows, behavior concentrates around familiar high-reward routines, reducing the chance of discovering better alternatives. To address this problem, we propose Autonomous Policy EXploration (APEX), which builds and maintains an explicit strategy space through a strategy map-a directed acyclic graph of milestones with prerequisite dependency edges. In APEX, Fork Discovery expands the map with evidence-grounded unexplored directions, while Policy Selection balances exploration and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
