LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, Dong Yu

TL;DR
This paper introduces LongMemEval, a benchmark for evaluating long-term memory in chat assistants, revealing significant performance gaps and proposing optimized memory design strategies to enhance recall and reasoning over sustained interactions.
Contribution
We present LongMemEval, a comprehensive benchmark for long-term memory in chat assistants, and propose a unified framework with design optimizations to improve memory performance.
Findings
Commercial chat assistants drop 30% accuracy on long-term memory tasks.
Memory optimizations significantly improve recall and question answering.
The benchmark provides a challenging testbed for future memory system improvements.
Abstract
Recent large language model (LLM)-driven chat assistant systems have integrated memory components to track user-assistant chat histories, enabling more accurate and personalized responses. However, their long-term memory capabilities in sustained interactions remain underexplored. We introduce LongMemEval, a comprehensive benchmark designed to evaluate five core long-term memory abilities of chat assistants: information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention. With 500 meticulously curated questions embedded within freely scalable user-assistant chat histories, LongMemEval presents a significant challenge to existing long-term memory systems, with commercial chat assistants and long-context LLMs showing a 30% accuracy drop on memorizing information across sustained interactions. We then present a unified framework that breaks down the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRecommender Systems and Techniques · Personal Information Management and User Behavior · AI in Service Interactions
