EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory

Ye Shen; Dun Pei; Yiqiu Guo; Junying Wang; Yijin Guo; Zicheng Zhang; Qi Jia; Jun Zhou; Guangtao Zhai

arXiv:2601.03543·cs.CL·January 8, 2026

EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory

Ye Shen, Dun Pei, Yiqiu Guo, Junying Wang, Yijin Guo, Zicheng Zhang, Qi Jia, Jun Zhou, Guangtao Zhai

PDF

Open Access

TL;DR

EvolMem is a comprehensive benchmark inspired by cognitive psychology designed to evaluate large language models' multi-session conversational memory across various dimensions, revealing current limitations in memory capabilities and efficiency.

Contribution

The paper introduces EvolMem, a novel, scalable benchmark for multi-session dialogue memory assessment, incorporating fine-grained cognitive-inspired memory abilities.

Findings

01

No LLM outperforms others across all memory dimensions

02

Agent memory mechanisms do not always improve LLM capabilities

03

Memory mechanisms often face efficiency limitations

Abstract

Despite recent advances in understanding and leveraging long-range conversational memory, existing benchmarks still lack systematic evaluation of large language models(LLMs) across diverse memory dimensions, particularly in multi-session settings. In this work, we propose EvolMem, a new benchmark for assessing multi-session memory capabilities of LLMs and agent systems. EvolMem is grounded in cognitive psychology and encompasses both declarative and non-declarative memory, further decomposed into multiple fine-grained abilities. To construct the benchmark, we introduce a hybrid data synthesis framework that consists of topic-initiated generation and narrative-inspired transformations. This framework enables scalable generation of multi-session conversations with controllable complexity, accompanied by sample-specific evaluation guidelines. Extensive evaluation reveals that no LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Mental Health via Writing · Machine Learning in Healthcare