MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

Qingyao Ai; Yichen Tang; Changyue Wang; Jianming Long; Weihang Su; Yiqun Liu

arXiv:2510.17281·cs.LG·May 12, 2026

MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, Yiqun Liu

PDF

2 Datasets

TL;DR

MemoryBench is a new benchmark designed to evaluate the memory and continual learning capabilities of large language models across diverse tasks, domains, and languages, addressing limitations of existing benchmarks.

Contribution

The paper introduces a comprehensive benchmark with a user feedback simulation framework to assess LLMs' continual learning abilities beyond traditional reading comprehension tasks.

Findings

01

State-of-the-art models perform poorly on the new benchmark.

02

The benchmark reveals gaps in current LLM memory and learning capabilities.

03

Experiments highlight the need for improved algorithms for continual learning.

Abstract

Scaling up data, parameters, and test-time computation has been the mainstream methods to improve LLM systems (LLMsys), but their upper bounds are almost reached due to the gradual depletion of high-quality data and marginal gains obtained from larger computational resource consumption. Inspired by the abilities of human and traditional AI systems in learning from practice, constructing memory and continual learning frameworks for LLMsys has become an important and popular research direction in recent literature. Yet, existing benchmarks for LLM memory often focus on evaluating the system on homogeneous reading comprehension tasks with long-form inputs rather than testing their abilities to learn from accumulated user feedback in service time. Therefore, we propose a user feedback simulation framework and a comprehensive benchmark covering multiple domains, languages, and types of tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.