Reinforcement Learning Gradients as Vitamin for Online Finetuning   Decision Transformers

Kai Yan; Alexander G. Schwing; Yu-Xiong Wang

arXiv:2410.24108·cs.LG·November 1, 2024

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores how reinforcement learning gradients can enhance online finetuning of decision transformers, revealing that integrating RL value-based gradients improves performance especially with low-reward offline pretraining.

Contribution

The paper provides a theoretical analysis of online finetuning challenges for decision transformers and demonstrates that adding RL gradients like TD3 improves their adaptation.

Findings

01

Adding TD3 gradients improves online finetuning of decision transformers.

02

RTG-based pretraining hampers online adaptation, but RL gradients mitigate this.

03

The approach is especially effective with low-reward offline data.

Abstract

Decision Transformers have recently emerged as a new and compelling paradigm for offline Reinforcement Learning (RL), completing a trajectory in an autoregressive way. While improvements have been made to overcome initial shortcomings, online finetuning of decision transformers has been surprisingly under-explored. The widely adopted state-of-the-art Online Decision Transformer (ODT) still struggles when pretrained with low-reward offline data. In this paper, we theoretically analyze the online-finetuning of the decision transformer, showing that the commonly used Return-To-Go (RTG) that's far from the expected return hampers the online fine-tuning process. This problem, however, is well-addressed by the value function and advantage of standard RL algorithms. As suggested by our analysis, in our experiments, we hence find that simply adding TD3 gradients to the finetuning process of ODT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaiyan289/rl_as_vitamin_for_online_decision_transformers
pytorchOfficial

Videos

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers· slideslive

Taxonomy

TopicsNeural Networks and Reservoir Computing

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Position-Wise Feed-Forward Layer · Adam · Clipped Double Q-learning