Future-conditioned Unsupervised Pretraining for Decision Transformer

Zhihui Xie; Zichuan Lin; Deheng Ye; Qiang Fu; Wei Yang; Shuai Li

arXiv:2305.16683·cs.LG·May 29, 2023·2 cites

Future-conditioned Unsupervised Pretraining for Decision Transformer

Zhihui Xie, Zichuan Lin, Deheng Ye, Qiang Fu, Wei Yang, Shuai Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Pretrained Decision Transformer (PDT), a novel unsupervised pretraining method for offline reinforcement learning that uses future trajectory information to improve decision-making and generalization from reward-free data.

Contribution

The paper proposes PDT, a simple yet effective approach for unsupervised RL pretraining that leverages future trajectory information, enabling better generalization and sub-optimal data utilization.

Findings

01

PDT outperforms supervised pretraining on sub-optimal data.

02

PDT can extract diverse behaviors from offline data.

03

PDT allows controllable sampling of high-return behaviors during finetuning.

Abstract

Recent research in offline reinforcement learning (RL) has demonstrated that return-conditioned supervised learning is a powerful paradigm for decision-making problems. While promising, return conditioning is limited to training data labeled with rewards and therefore faces challenges in learning from unsupervised data. In this work, we aim to utilize generalized future conditioning to enable efficient unsupervised pretraining from reward-free and sub-optimal offline data. We propose Pretrained Decision Transformer (PDT), a conceptually simple approach for unsupervised RL pretraining. PDT leverages future trajectory information as a privileged context to predict actions during training. The ability to make decisions based on both present and future factors enhances PDT's capability for generalization. Besides, this feature can be easily incorporated into a return-conditioned framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fffffarmer/pdt
pytorchOfficial

Videos

Future-conditioned Unsupervised Pretraining for Decision Transformer· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Residual Connection