In-Context Reinforcement Learning From Suboptimal Historical Data

Juncheng Dong; Moyang Guo; Ethan X. Fang; Zhuoran Yang; Vahid Tarokh

arXiv:2601.20116·cs.LG·January 29, 2026

In-Context Reinforcement Learning From Suboptimal Historical Data

Juncheng Dong, Moyang Guo, Ethan X. Fang, Zhuoran Yang, Vahid Tarokh

PDF

Open Access 1 Video

TL;DR

This paper introduces the Decision Importance Transformer (DIT), a novel framework that enables in-context reinforcement learning from suboptimal historical data by emulating actor-critic algorithms with transformers.

Contribution

The paper proposes DIT, a transformer-based method that estimates advantage functions and improves policy learning from suboptimal offline datasets in RL tasks.

Findings

01

DIT outperforms standard methods on bandit and MDP problems.

02

The approach effectively leverages suboptimal data to learn near-optimal policies.

03

Transformers can emulate actor-critic algorithms in an in-context learning setting.

Abstract

Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training an autoregressive transformer for in-context reinforcement learning (ICRL). In this setting, we initially train a transformer on an offline dataset consisting of trajectories collected from various RL tasks, and then fix and use this transformer to create an action policy for new RL tasks. Notably, we consider the setting where the offline dataset contains trajectories sampled from suboptimal behavioral policies. In this case, standard autoregressive training corresponds to imitation learning and results in suboptimal performance. To address this, we propose the Decision Importance Transformer(DIT) framework, which emulates the actor-critic algorithm in an in-context manner. In particular, we first train a transformer-based value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

In-Context Reinforcement Learning From Suboptimal Historical Data· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)