In-Context Reinforcement Learning From Suboptimal Historical Data
Juncheng Dong, Moyang Guo, Ethan X. Fang, Zhuoran Yang, Vahid Tarokh

TL;DR
This paper introduces the Decision Importance Transformer (DIT), a novel framework that enables in-context reinforcement learning from suboptimal historical data by emulating actor-critic algorithms with transformers.
Contribution
The paper proposes DIT, a transformer-based method that estimates advantage functions and improves policy learning from suboptimal offline datasets in RL tasks.
Findings
DIT outperforms standard methods on bandit and MDP problems.
The approach effectively leverages suboptimal data to learn near-optimal policies.
Transformers can emulate actor-critic algorithms in an in-context learning setting.
Abstract
Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training an autoregressive transformer for in-context reinforcement learning (ICRL). In this setting, we initially train a transformer on an offline dataset consisting of trajectories collected from various RL tasks, and then fix and use this transformer to create an action policy for new RL tasks. Notably, we consider the setting where the offline dataset contains trajectories sampled from suboptimal behavioral policies. In this case, standard autoregressive training corresponds to imitation learning and results in suboptimal performance. To address this, we propose the Decision Importance Transformer(DIT) framework, which emulates the actor-critic algorithm in an in-context manner. In particular, we first train a transformer-based value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
