Evaluating Memory Structure in LLM Agents

Alina Shutova; Alexandra Olenina; Ivan Vinogradov; Anton Sinitsin

arXiv:2602.11243·cs.LG·February 13, 2026

Evaluating Memory Structure in LLM Agents

Alina Shutova, Alexandra Olenina, Ivan Vinogradov, Anton Sinitsin

PDF

Open Access

TL;DR

This paper introduces StructMemEval, a benchmark to evaluate how well LLM-based agents can organize and utilize complex long-term memory structures, revealing current limitations and guiding future improvements.

Contribution

It proposes a new benchmark for testing complex memory organization in LLM agents, addressing a gap in existing factual recall benchmarks.

Findings

01

Simple retrieval-augmented LLMs struggle with structured memory tasks.

02

Memory agents can solve tasks when prompted on memory organization.

03

Modern LLMs often fail to recognize memory structures without explicit prompts.

Abstract

Modern LLM-based agents and chat assistants rely on long-term memory frameworks to store reusable knowledge, recall user preferences, and augment reasoning. As researchers create more complex memory architectures, it becomes increasingly difficult to analyze their capabilities and guide future memory designs. Most long-term memory benchmarks focus on simple fact retention, multi-hop recall, and time-based changes. While undoubtedly important, these capabilities can often be achieved with simple retrieval-augmented LLMs and do not test complex memory hierarchies. To bridge this gap, we propose StructMemEval - a benchmark that tests the agent's ability to organize its long-term memory, not just factual recall. We gather a suite of tasks that humans solve by organizing their knowledge in a specific structure: transaction ledgers, to-do lists, trees and others. Our initial experiments show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · AI in Service Interactions · Personal Information Management and User Behavior