Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents
Yifei Li, Weidong Guo, Lingling Zhang, Rongman Xu, Muye Huang, Hui Liu, Lijiao Xu, Yu Xu, Jun Liu

TL;DR
Locomo-Plus introduces a new benchmark for evaluating long-term cognitive memory in LLMs, focusing on implicit constraints like user goals and values, revealing challenges not captured by existing metrics.
Contribution
The paper presents LoCoMo-Plus, a novel benchmark and evaluation framework for assessing cognitive memory in LLMs beyond factual recall, emphasizing implicit constraints and semantic disconnects.
Findings
Conventional metrics are misaligned with cognitive memory evaluation.
Models struggle with retaining and applying latent constraints.
The framework reveals failures in existing models not seen in prior benchmarks.
Abstract
Long-term conversational memory is a core capability for LLM-based dialogue systems, yet existing benchmarks and evaluation protocols primarily focus on surface-level factual recall. In realistic interactions, appropriate responses often depend on implicit constraints such as user state, goals, or values that are not explicitly queried later. To evaluate this setting, we introduce \textbf{LoCoMo-Plus}, a benchmark for assessing cognitive memory under cue--trigger semantic disconnect, where models must retain and apply latent constraints across long conversational contexts. We further show that conventional string-matching metrics and explicit task-type prompting are misaligned with such scenarios, and propose a unified evaluation framework based on constraint consistency. Experiments across diverse backbone models, retrieval-based methods, and memory systems demonstrate that cognitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Social Robot Interaction and HRI
