Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

TL;DR
This paper explores how reinforcement learning gradients can enhance online finetuning of decision transformers, revealing that integrating RL value-based gradients improves performance especially with low-reward offline pretraining.
Contribution
The paper provides a theoretical analysis of online finetuning challenges for decision transformers and demonstrates that adding RL gradients like TD3 improves their adaptation.
Findings
Adding TD3 gradients improves online finetuning of decision transformers.
RTG-based pretraining hampers online adaptation, but RL gradients mitigate this.
The approach is especially effective with low-reward offline data.
Abstract
Decision Transformers have recently emerged as a new and compelling paradigm for offline Reinforcement Learning (RL), completing a trajectory in an autoregressive way. While improvements have been made to overcome initial shortcomings, online finetuning of decision transformers has been surprisingly under-explored. The widely adopted state-of-the-art Online Decision Transformer (ODT) still struggles when pretrained with low-reward offline data. In this paper, we theoretically analyze the online-finetuning of the decision transformer, showing that the commonly used Return-To-Go (RTG) that's far from the expected return hampers the online fine-tuning process. This problem, however, is well-addressed by the value function and advantage of standard RL algorithms. As suggested by our analysis, in our experiments, we hence find that simply adding TD3 gradients to the finetuning process of ODT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Reservoir Computing
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Position-Wise Feed-Forward Layer · Adam · Clipped Double Q-learning
