Future-conditioned Unsupervised Pretraining for Decision Transformer
Zhihui Xie, Zichuan Lin, Deheng Ye, Qiang Fu, Wei Yang, Shuai Li

TL;DR
This paper introduces Pretrained Decision Transformer (PDT), a novel unsupervised pretraining method for offline reinforcement learning that uses future trajectory information to improve decision-making and generalization from reward-free data.
Contribution
The paper proposes PDT, a simple yet effective approach for unsupervised RL pretraining that leverages future trajectory information, enabling better generalization and sub-optimal data utilization.
Findings
PDT outperforms supervised pretraining on sub-optimal data.
PDT can extract diverse behaviors from offline data.
PDT allows controllable sampling of high-return behaviors during finetuning.
Abstract
Recent research in offline reinforcement learning (RL) has demonstrated that return-conditioned supervised learning is a powerful paradigm for decision-making problems. While promising, return conditioning is limited to training data labeled with rewards and therefore faces challenges in learning from unsupervised data. In this work, we aim to utilize generalized future conditioning to enable efficient unsupervised pretraining from reward-free and sub-optimal offline data. We propose Pretrained Decision Transformer (PDT), a conceptually simple approach for unsupervised RL pretraining. PDT leverages future trajectory information as a privileged context to predict actions during training. The ability to make decisions based on both present and future factors enhances PDT's capability for generalization. Besides, this feature can be easily incorporated into a return-conditioned framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Residual Connection
