MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents

Haoran Tan; Zeyu Zhang; Chen Ma; Xu Chen; Quanyu Dai; Zhenhua Dong

arXiv:2506.21605·cs.CL·June 30, 2025

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents

Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, Zhenhua Dong

PDF

Open Access 1 Repo

TL;DR

This paper introduces MemBench, a comprehensive benchmark and dataset designed to evaluate the memory capabilities of LLM-based agents across multiple levels and scenarios, addressing previous evaluation limitations.

Contribution

The paper presents a new dataset and benchmark that assess factual and reflective memory in LLM agents through diverse interactive scenarios and metrics.

Findings

01

Dataset includes factual and reflective memory levels.

02

Benchmark evaluates effectiveness, efficiency, and capacity.

03

Resources are publicly available at GitHub.

Abstract

Recent works have highlighted the significance of memory mechanisms in LLM-based agents, which enable them to store observed information and adapt to dynamic environments. However, evaluating their memory capabilities still remains challenges. Previous evaluations are commonly limited by the diversity of memory levels and interactive scenarios. They also lack comprehensive metrics to reflect the memory capabilities from multiple aspects. To address these problems, in this paper, we construct a more comprehensive dataset and benchmark to evaluate the memory capability of LLM-based agents. Our dataset incorporates factual memory and reflective memory as different levels, and proposes participation and observation as various interactive scenarios. Based on our dataset, we present a benchmark, named MemBench, to evaluate the memory capability of LLM-based agents from multiple aspects,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

import-myself/membench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multi-Agent Systems and Negotiation · Multimodal Machine Learning Applications