Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents

Yifei Li; Weidong Guo; Lingling Zhang; Rongman Xu; Muye Huang; Hui Liu; Lijiao Xu; Yu Xu; Jun Liu

arXiv:2602.10715·cs.CL·February 12, 2026

Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents

Yifei Li, Weidong Guo, Lingling Zhang, Rongman Xu, Muye Huang, Hui Liu, Lijiao Xu, Yu Xu, Jun Liu

PDF

Open Access

TL;DR

Locomo-Plus introduces a new benchmark for evaluating long-term cognitive memory in LLMs, focusing on implicit constraints like user goals and values, revealing challenges not captured by existing metrics.

Contribution

The paper presents LoCoMo-Plus, a novel benchmark and evaluation framework for assessing cognitive memory in LLMs beyond factual recall, emphasizing implicit constraints and semantic disconnects.

Findings

01

Conventional metrics are misaligned with cognitive memory evaluation.

02

Models struggle with retaining and applying latent constraints.

03

The framework reveals failures in existing models not seen in prior benchmarks.

Abstract

Long-term conversational memory is a core capability for LLM-based dialogue systems, yet existing benchmarks and evaluation protocols primarily focus on surface-level factual recall. In realistic interactions, appropriate responses often depend on implicit constraints such as user state, goals, or values that are not explicitly queried later. To evaluate this setting, we introduce \textbf{LoCoMo-Plus}, a benchmark for assessing cognitive memory under cue--trigger semantic disconnect, where models must retain and apply latent constraints across long conversational contexts. We further show that conventional string-matching metrics and explicit task-type prompting are misaligned with such scenarios, and propose a unified evaluation framework based on constraint consistency. Experiments across diverse backbone models, retrieval-based methods, and memory systems demonstrate that cognitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Social Robot Interaction and HRI