EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

Yuyao Wang; Zhongjian Zhang; Mo Chi; Kaichi Yu; Yuhan Li; Miao Peng; Bing Tong; Chen Zhang; Yan Zhou; Jia Li

arXiv:2605.18421·cs.CL·May 19, 2026

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

Yuyao Wang, Zhongjian Zhang, Mo Chi, Kaichi Yu, Yuhan Li, Miao Peng, Bing Tong, Chen Zhang, Yan Zhou, Jia Li

PDF

1 Repo

TL;DR

EvoMemBench introduces a comprehensive benchmark to evaluate various memory mechanisms in LLM agents, highlighting current limitations and guiding future improvements.

Contribution

The paper presents EvoMemBench, a systematic benchmark for agent memory, comparing 15 methods across different memory scopes and contents.

Findings

01

Long-context baselines are highly competitive.

02

Memory is most helpful when context is insufficient or tasks are difficult.

03

No single memory method outperforms others across all settings.

Abstract

Recent benchmarks for Large Language Model (LLM) agents mainly evaluate reasoning, planning, and execution. However, memory is also essential for agents, as it enables them to store, update, and retrieve information over time. This ability remains under-evaluated, largely because existing benchmarks do not provide a systematic way to assess memory mechanisms. In this paper, we study agent memory from a self-evolving perspective and introduce EvoMemBench, a unified benchmark organized along two axes: memory scope (in-episode vs. cross-episode) and memory content (knowledge-oriented vs. execution-oriented). We compare 15 representative memory methods with strong long-context baselines under a standardized protocol. Results show that current memory systems are still far from a general solution: long-context baselines remain highly competitive, memory helps most when the current context is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DSAIL-Memory/EvoMemBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.