Online Finetuning Decision Transformers with Pure RL Gradients

Junkai Luo; Yinglun Zhu

arXiv:2601.00167·cs.LG·January 5, 2026

Online Finetuning Decision Transformers with Pure RL Gradients

Junkai Luo, Yinglun Zhu

PDF

Open Access

TL;DR

This paper introduces novel algorithms for online finetuning of Decision Transformers using pure reinforcement learning gradients, overcoming previous limitations and achieving state-of-the-art results in various benchmarks.

Contribution

It presents the first methods enabling online finetuning of Decision Transformers with pure RL gradients, including adaptations of GRPO and new techniques for stability and exploration.

Findings

01

Outperforms existing online Decision Transformer methods.

02

Achieves state-of-the-art results across multiple benchmarks.

03

Demonstrates stability and efficiency improvements in online RL finetuning.

Abstract

Decision Transformers (DTs) have emerged as a powerful framework for sequential decision making by formulating offline reinforcement learning (RL) as a sequence modeling problem. However, extending DTs to online settings with pure RL gradients remains largely unexplored, as existing approaches continue to rely heavily on supervised sequence-modeling objectives during online finetuning. We identify hindsight return relabeling -- a standard component in online DTs -- as a critical obstacle to RL-based finetuning: while beneficial for supervised learning, it is fundamentally incompatible with importance sampling-based RL algorithms such as GRPO, leading to unstable training. Building on this insight, we propose new algorithms that enable online finetuning of Decision Transformers using pure reinforcement learning gradients. We adapt GRPO to DTs and introduce several key modifications,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Risk and Portfolio Optimization