Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Egor Cherepanov; Nikita Kachaev; Artem Zholus; Alexey K. Kovalev; Aleksandr I. Panov

arXiv:2412.06531·cs.LG·March 5, 2026

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Egor Cherepanov, Nikita Kachaev, Artem Zholus, Alexey K. Kovalev, Aleksandr I. Panov

PDF

Open Access 3 Reviews

TL;DR

This paper clarifies the concept of memory in reinforcement learning agents, proposes a standardized evaluation methodology, and demonstrates its importance through empirical experiments to enable objective comparison of memory capabilities.

Contribution

It provides precise definitions of memory types in RL, categorizes agent memory, and introduces a standardized evaluation framework for memory assessment.

Findings

01

Standardized methodology improves evaluation consistency

02

Empirical results show the importance of proper memory assessment

03

Violation of methodology leads to unreliable memory judgments

Abstract

The incorporation of memory into agents is essential for numerous tasks within the domain of Reinforcement Learning (RL). In particular, memory is paramount for tasks that require the use of past information, adaptation to novel environments, and improved sample efficiency. However, the term "memory" encompasses a wide range of concepts, which, coupled with the lack of a unified methodology for validating an agent's memory, leads to erroneous judgments about agents' memory capabilities and prevents objective comparison with other memory-enhanced agents. This paper aims to streamline the concept of memory in RL by providing practical precise definitions of agent memory types, such as long-term vs. short-term memory and declarative vs. procedural memory, inspired by cognitive science. Using these definitions, we categorize different classes of agent memory, propose a robust experimental…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

This paper is generally well-written and well-motivated. This paper studies the relatively underexplored subject of “memory in RL agents” and presents formal definitions of various memory concepts, highlighting the importance of appropriate experimental configurations for evaluating them. Pictorial illustrations help clarify the concepts.

Weaknesses

Some of the definitions require more explanation to fully understand the concept. Paper formatting could be improved. Please see the questions and comments.

Reviewer 02Rating 4Confidence 2

Strengths

**Conceptual Clarification and Formalization** The paper successfully translates ambiguous cognitive science terms (e.g., STM/LTM, declarative/procedural memory) into precise, quantifiable, and verifiable definitions within RL (see Definitions 4.4–4.6). This formalization fills a significant void in current RL literature, where “memory” is often used loosely or inconsistently. **Proposal of a Unified Evaluation Framework** The distinction between Memory Decision-Making (Memory DM) and Meta-RL i

Weaknesses

**Abstract Treatment of Memory Mechanisms** While Definition 4.7 defines a memory mechanism as a mapping from base context K to effective context K_eff, it does not differentiate between implementation strategies (e.g., external memory, world models, state-space models). A deeper discussion of how different architectures realize µ(K) would strengthen the framework. **Limited Coverage of Other Memory Types** The paper focuses primarily on declarative memory along the temporal axis. Although the

Reviewer 03Rating 4Confidence 2

Strengths

1. They standardized the definitions of different types of memory, thereby providing a framework for fair evaluation of subsequent models. 2. The definitions and formalizations are careful and rigorous.

Weaknesses

The paper spends too much time on the basic theoretical and general analyses, while the experiments and analyses conducted under this theoretical framework are rather limited. Although the authors include several validation experiments demonstrating the necessity and importance of defining and distinguishing different types of memory, they do not provide any insightful conclusions or analyses derived from evaluations based on this distinction. In other words, what I expected to see was either a

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics