Retrieval-Augmented Decision Transformer: External Memory for In-context RL

Thomas Schmied; Fabian Paischer; Vihang Patil; Markus Hofmarcher; Razvan Pascanu; Sepp Hochreiter

arXiv:2410.07071·cs.LG·August 14, 2025

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

Thomas Schmied, Fabian Paischer, Vihang Patil, Markus Hofmarcher, Razvan Pascanu, Sepp Hochreiter

PDF

Open Access 1 Repo 3 Datasets 3 Reviews

TL;DR

The paper introduces RA-DT, a retrieval-augmented decision transformer that uses external memory to improve in-context reinforcement learning, especially in complex environments with long episodes and sparse rewards.

Contribution

RA-DT employs a domain-agnostic external memory mechanism for retrieving relevant experiences, enabling effective in-context RL in complex environments with long episodes.

Findings

01

RA-DT outperforms baselines on grid-world environments.

02

RA-DT uses only a fraction of the context length compared to existing methods.

03

The retrieval component does not require training and is domain-agnostic.

Abstract

In-context learning (ICL) is the ability of a model to learn a new task by observing a few exemplars in its context. While prevalent in NLP, this capability has recently also been observed in Reinforcement Learning (RL) settings. Prior in-context RL methods, however, require entire episodes in the agent's context. Given that complex environments typically lead to long episodes with sparse rewards, these methods are constrained to simple environments with short episodes. To address these challenges, we introduce Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external memory mechanism to store past experiences from which it retrieves only sub-trajectories relevant for the current situation. The retrieval component in RA-DT does not require training and can be entirely domain-agnostic. We evaluate the capabilities of RA-DT on grid-world environments, robotics…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 8Confidence 4

Strengths

1. The proposed RA-DT leverages an external memory mechanism to store and retrieve relevant sub-trajectories, effectively tackling the challenge of managing context in environments with long episodes and sparse rewards. 2. The authors conducted extensive experiments to showcase the method’s effectiveness and to highlight the contribution of each module. 3. The authors released datasets for multiple environments, offering valuable resources to support future research in in-context RL.

Weaknesses

1. Although the authors explain the rationale for using each module in RA-DT and demonstrate their effectiveness through experiments, the proposed RA-DT appears to be a combination of various existing methods [1, 2, 3]. What specific innovations do each of these methods introduce compared to the original approaches? 2. The processes of searching forsimilar experiences and reweighting retrieved experiences introduce additional computational overhead, which is neither discussed in detail nor evalu

Reviewer 02Rating 5Confidence 4

Strengths

The retrieval-augmentation idea is compelling. It particularly shines in its ability of cutting down the context length that is required for in-context reinforcement learning. Further, the idea of utilizing pre-trained language models for the domain-agnostic embedding is very interesting and could be useful in this type of in-context RL. Lastly, the work evaluates the approach on a broad variety of problems including more toy-like problems to larger and more complex ones.

Weaknesses

While the work provides a broad evaluation of the proposed method, the experiments highlight the shortcomings of the method. Often the proposed method does not achieve better in-context learning abilities, particularly on problems that are not grid-worlds. Due to these shortcomings the authors provide a longer discussion section on the potential shortcomings of offline in-context RL methods, though this discussion does not provide explanations or better understanding why the proposed method did

Reviewer 03Rating 1Confidence 3

Strengths

-Model agnostic Retrieval-Augmentation -Smaller context length and boost in Performance

Weaknesses

1) Novelty: The novelty of this work seems limited as it primarily leverages existing techniques like retrieval augmentation and Decision Transformers. 2) Qaulity Data Availability and Relevance: If quality relevant data is not available is in storage then it is not very helpful.

Code & Models

Repositories

ml-jku/RA-DT
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Context-Aware Activity Recognition Systems · Robotics and Automated Systems

MethodsDense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings