SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
Xinshun Feng, Xinhao Song, Lijun Li, Gongshen Liu, Jing Shao

TL;DR
SEARL introduces a structured experience memory framework for self-evolving agents, enhancing learning efficiency and generalization in resource-limited environments by integrating planning, execution, and explicit knowledge extraction.
Contribution
The paper presents a novel Tool-Memory based framework that constructs structured experience memory for improved self-evolving agent learning, addressing reward sparsity and resource constraints.
Findings
Effective in knowledge reasoning tasks
Improves learning efficiency in mathematics tasks
Facilitates tool reuse and generalization
Abstract
Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) have demonstrated significant potential in single-turn reasoning tasks. With the paradigm shift toward self-evolving agentic learning, models are increasingly expected to learn from trajectories by synthesizing tools or accumulating explicit experiences. However, prevailing methods typically rely on large-scale LLMs or multi-agent frameworks, which hinder their deployment in resource-constrained environments. The inherent sparsity of outcome-based rewards also poses a substantial challenge, as agents typically receive feedback only upon completion of tasks. To address these limitations, we introduce a Tool-Memory based self-evolving agentic framework SEARL. Unlike approaches that directly utilize interaction experiences, our method constructs a structured experience memory that integrates planning with execution.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
