MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts
Zhen Tao, Jinxiang Zhao, Peng Liu, Dinghao Xi, Yanfang Chen, Wei Xu, Zhiyu Li

TL;DR
MemConflict introduces a diagnostic framework to evaluate long-term memory systems in language models, focusing on retrieval accuracy and conflict resolution across multi-session interactions.
Contribution
It formalizes conflict types in memory retrieval, creates a benchmark with controlled conflicts, and provides insights into system strengths and weaknesses.
Findings
Memory correctness often diverges from retrieval and ranking.
Longer histories and distractors degrade performance.
Failures include missing supporting memories and ineffective retrieval.
Abstract
Long-term memory systems enable conversational agents based on large language models (LLMs) to retain, retrieve, and apply user-specific information across multi-session interactions. However, existing evaluations mainly assess outcome-level performance or temporal updating, providing limited insight into how systems retrieve and rank temporally valid, factually correct, and contextually applicable memory evidence under conflicting alternatives. To address this gap, we propose MemConflict, a diagnostic framework that treats memory validity as a query-conditioned fitness-for-use problem. MemConflict formalizes dynamic, static, and conditional conflicts over temporal validity, factual correctness, and contextual applicability. It simulates controlled long-horizon histories from structured user profiles, introduces cross-session conflicts, and injects semantically similar distractors to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
