Filtering Learning Histories Enhances In-Context Reinforcement Learning

Weiqin Chen; Xinjie Zhang; Dharmashankar Subramanian; Santiago Paternain

arXiv:2505.15143·cs.LG·May 22, 2025

Filtering Learning Histories Enhances In-Context Reinforcement Learning

Weiqin Chen, Xinjie Zhang, Dharmashankar Subramanian, Santiago Paternain

PDF

Open Access

TL;DR

This paper introduces a novel dataset preprocessing method called learning history filtering (LHF) that improves in-context reinforcement learning by filtering suboptimal learning histories, leading to more robust performance across various environments and algorithms.

Contribution

LHF is the first approach to prevent inheriting suboptimal behaviors in ICRL through dataset filtering, compatible with existing algorithms and effective in noisy data scenarios.

Findings

01

LHF improves performance on ICRL benchmarks.

02

LHF is robust across different hyperparameters and sampling strategies.

03

LHF performs especially well with noisy data.

Abstract

Transformer models (TMs) have exhibited remarkable in-context reinforcement learning (ICRL) capabilities, allowing them to generalize to and improve in previously unseen environments without re-training or fine-tuning. This is typically accomplished by imitating the complete learning histories of a source RL algorithm over a substantial amount of pretraining environments, which, however, may transfer suboptimal behaviors inherited from the source algorithm/dataset. Therefore, in this work, we address the issue of inheriting suboptimality from the perspective of dataset preprocessing. Motivated by the success of the weighted empirical risk minimization, we propose a simple yet effective approach, learning history filtering (LHF), to enhance ICRL by reweighting and filtering the learning histories based on their improvement and stability characteristics. To the best of our knowledge, LHF…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural dynamics and brain function · Reinforcement Learning in Robotics

MethodsAttention Is All You Need · Softmax · Convolution · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization · Six Ways To Communicate To Someone At Expedia Via Phone And Email's. · Dense Prediction Transformer