Filtering Learning Histories Enhances In-Context Reinforcement Learning
Weiqin Chen, Xinjie Zhang, Dharmashankar Subramanian, Santiago Paternain

TL;DR
This paper introduces a novel dataset preprocessing method called learning history filtering (LHF) that improves in-context reinforcement learning by filtering suboptimal learning histories, leading to more robust performance across various environments and algorithms.
Contribution
LHF is the first approach to prevent inheriting suboptimal behaviors in ICRL through dataset filtering, compatible with existing algorithms and effective in noisy data scenarios.
Findings
LHF improves performance on ICRL benchmarks.
LHF is robust across different hyperparameters and sampling strategies.
LHF performs especially well with noisy data.
Abstract
Transformer models (TMs) have exhibited remarkable in-context reinforcement learning (ICRL) capabilities, allowing them to generalize to and improve in previously unseen environments without re-training or fine-tuning. This is typically accomplished by imitating the complete learning histories of a source RL algorithm over a substantial amount of pretraining environments, which, however, may transfer suboptimal behaviors inherited from the source algorithm/dataset. Therefore, in this work, we address the issue of inheriting suboptimality from the perspective of dataset preprocessing. Motivated by the success of the weighted empirical risk minimization, we propose a simple yet effective approach, learning history filtering (LHF), to enhance ICRL by reweighting and filtering the learning histories based on their improvement and stability characteristics. To the best of our knowledge, LHF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function · Reinforcement Learning in Robotics
MethodsAttention Is All You Need · Softmax · Convolution · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization · Six Ways To Communicate To Someone At Expedia Via Phone And Email's. · Dense Prediction Transformer
