Quantifying Memory Use in Reinforcement Learning with Temporal Range

Rodney Lafuente-Mercado; Daniela Rus; T. Konstantin Rusch

arXiv:2512.06204·cs.LG·December 9, 2025

Quantifying Memory Use in Reinforcement Learning with Temporal Range

Rodney Lafuente-Mercado, Daniela Rus, T. Konstantin Rusch

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Temporal Range, a metric to quantify how much past observations influence a reinforcement learning policy, aiding in understanding and optimizing memory use across different tasks and architectures.

Contribution

The paper proposes a novel, model-agnostic metric called Temporal Range that measures the temporal influence of past inputs on RL policies, extending previous range measures to vector outputs.

Findings

01

Temporal Range remains small in fully observed control tasks.

02

It scales with the true lag in Copy-$k$ tasks.

03

Aligns with the minimal history window needed for near-optimal performance.

Abstract

How much does a trained RL policy actually use its past observations? We propose \emph{Temporal Range}, a model-agnostic metric that treats first-order sensitivities of multiple vector outputs across a temporal window to the input sequence as a temporal influence profile and summarizes it by the magnitude-weighted average lag. Temporal Range is computed via reverse-mode automatic differentiation from the Jacobian blocks $\partial y_{s} / \partial x_{t} \in R^{c \times d}$ averaged over final timesteps $s \in {t + 1, \dots, T}$ and is well-characterized in the linear setting by a small set of natural axioms. Across diagnostic and control tasks (POPGym; flicker/occlusion; Copy- $k$ ) and architectures (MLPs, RNNs, SSMs), Temporal Range (i) remains small in fully observed control, (ii) scales with the task's ground-truth lag in Copy- $k$ , and (iii) aligns with the minimum history window required…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. A lookback metric is useful in RL POMDP contexts 2. The proposed metric does what it advertises on the tested tasks 3. The paper format is nice and the paper is easy to read

Weaknesses

1. The contributions of this paper are a bit on the lighter side, comprising a single metric and evaluation on toy experiments. 2. The metric itself appears to be a weighted average. Tasks like CartPole and Copy have strong temporal correlations, looking at the last few timesteps or precisely $k$ timesteps back respectively. Tasks like 3D navigation are likely to have multimodal lookbacks that could provide misleading results with this metric. 3. The performance on some tasks in Table 2 seems un

Reviewer 02Rating 4Confidence 3

Strengths

Overall, I find the work to be clearly presented and original. I have some concerns about the significance of the proposed metric that I will discuss below in weaknesses, but first focus on the strengths of the work. The proposed metric of temporal range is clearly defined and introduced. I was not familiar with most of the related theoretical prior work but could follow the definitions and axioms as defined. The metric itself appears rather elegant and simple which I appreciate. I also apprec

Weaknesses

Below, I highlight any weaknesses that I see as critical / major with (**Major**). I'd expect these to be addressed for this work to be considered for acceptance. ## Understanding the Usefulness of Temporal Range 1. **Major:** My main concern about this work is the significance of the proposed temporal range metric. While Section 3.5 and 7 argue that the temporal range can be used to identify whether a shorter or longer context should be chosen for the current model, it appears difficult to ide

Reviewer 03Rating 2Confidence 3

Strengths

Originality: The paper's central idea is very original and the experiments are well designed to prove its correctness Significance: The paper does a good job of justifying the utility of the temporal range metric that they introduce. Although I was initially unsure of it being useful, I was convinced by the end of the paper.

Weaknesses

- The citation style should be changed to have the author's names in parentheses - Figure captions are too short: they should not only describe what is shown in the figure, but also the main message the figure is trying to show in relation to the story of the paper - The presentation of the figures could be improved a lot. For example, if one of the points being made is that \ro_t matches the minimum history window required for near-optimal return, then the \ro_t for each task should be shown as

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning