MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts

Zhen Tao; Jinxiang Zhao; Peng Liu; Dinghao Xi; Yanfang Chen; Wei Xu; Zhiyu Li

arXiv:2605.20926·cs.IR·May 21, 2026

MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts

Zhen Tao, Jinxiang Zhao, Peng Liu, Dinghao Xi, Yanfang Chen, Wei Xu, Zhiyu Li

PDF

TL;DR

MemConflict introduces a diagnostic framework to evaluate long-term memory systems in language models, focusing on retrieval accuracy and conflict resolution across multi-session interactions.

Contribution

It formalizes conflict types in memory retrieval, creates a benchmark with controlled conflicts, and provides insights into system strengths and weaknesses.

Findings

01

Memory correctness often diverges from retrieval and ranking.

02

Longer histories and distractors degrade performance.

03

Failures include missing supporting memories and ineffective retrieval.

Abstract

Long-term memory systems enable conversational agents based on large language models (LLMs) to retain, retrieve, and apply user-specific information across multi-session interactions. However, existing evaluations mainly assess outcome-level performance or temporal updating, providing limited insight into how systems retrieve and rank temporally valid, factually correct, and contextually applicable memory evidence under conflicting alternatives. To address this gap, we propose MemConflict, a diagnostic framework that treats memory validity as a query-conditioned fitness-for-use problem. MemConflict formalizes dynamic, static, and conditional conflicts over temporal validity, factual correctness, and contextual applicability. It simulates controlled long-horizon histories from structured user profiles, introduces cross-session conflicts, and injects semantically similar distractors to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.