MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization

Weizhi Zhang; Xiaokai Wei; Wei-Chieh Huang; Zheng Hui; Chen Wang; Michelle Gong; Philip S. Yu

arXiv:2603.25973·cs.CL·March 30, 2026

MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization

Weizhi Zhang, Xiaokai Wei, Wei-Chieh Huang, Zheng Hui, Chen Wang, Michelle Gong, Philip S. Yu

PDF

TL;DR

MemoryCD is a comprehensive benchmark for evaluating long-term, cross-domain user memory in large language model agents, based on real-world Amazon user data, highlighting current methods' limitations.

Contribution

Introduces MemoryCD, the first large-scale, real-world, cross-domain memory benchmark for LLMs, with a multi-faceted evaluation pipeline across diverse personalization tasks.

Findings

01

Existing memory methods underperform in real-world, cross-domain scenarios.

02

MemoryCD provides a new testbed for lifelong personalization evaluation.

03

Evaluation across 14 models and 4 tasks reveals significant gaps in current memory capabilities.

Abstract

Recent advancements in Large Language Models (LLMs) have expanded context windows to million-token scales, yet benchmarks for evaluating memory remain limited to short-session synthetic dialogues. We introduce \textsc{MemoryCD}, the first large-scale, user-centric, cross-domain memory benchmark derived from lifelong real-world behaviors in the Amazon Review dataset. Unlike existing memory datasets that rely on scripted personas to generate synthetic user data, \textsc{MemoryCD} tracks authentic user interactions across years and multiple domains. We construct a multi-faceted long-context memory evaluation pipeline of 14 state-of-the-art LLM base models with 6 memory method baselines on 4 distinct personalization tasks over 12 diverse domains to evaluate an agent's ability to simulate real user behaviors in both single and cross-domain settings. Our analysis reveals that existing memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.