SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

Xinshun Feng; Xinhao Song; Lijun Li; Gongshen Liu; Jing Shao

arXiv:2604.07791·cs.AI·April 21, 2026

SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

Xinshun Feng, Xinhao Song, Lijun Li, Gongshen Liu, Jing Shao

PDF

TL;DR

SEARL introduces a structured experience memory framework for self-evolving agents, enhancing learning efficiency and generalization in resource-limited environments by integrating planning, execution, and explicit knowledge extraction.

Contribution

The paper presents a novel Tool-Memory based framework that constructs structured experience memory for improved self-evolving agent learning, addressing reward sparsity and resource constraints.

Findings

01

Effective in knowledge reasoning tasks

02

Improves learning efficiency in mathematics tasks

03

Facilitates tool reuse and generalization

Abstract

Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) have demonstrated significant potential in single-turn reasoning tasks. With the paradigm shift toward self-evolving agentic learning, models are increasingly expected to learn from trajectories by synthesizing tools or accumulating explicit experiences. However, prevailing methods typically rely on large-scale LLMs or multi-agent frameworks, which hinder their deployment in resource-constrained environments. The inherent sparsity of outcome-based rewards also poses a substantial challenge, as agents typically receive feedback only upon completion of tasks. To address these limitations, we introduce a Tool-Memory based self-evolving agentic framework SEARL. Unlike approaches that directly utilize interaction experiences, our method constructs a structured experience memory that integrates planning with execution.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.