Online Decision Transformer

Qinqing Zheng; Amy Zhang; Aditya Grover

arXiv:2202.05607·cs.LG·July 14, 2022·21 cites

Online Decision Transformer

Qinqing Zheng, Amy Zhang, Aditya Grover

PDF

Open Access 2 Repos

TL;DR

Online Decision Transformers (ODT) integrate offline pretraining with online finetuning using sequence modeling and entropy regularization, leading to improved sample efficiency and performance in reinforcement learning tasks.

Contribution

The paper introduces ODT, a novel RL algorithm that unifies offline pretraining and online finetuning within a sequence modeling framework.

Findings

01

ODT achieves competitive performance on D4RL benchmark.

02

ODT shows significant gains during finetuning.

03

Sample-efficient exploration is enabled by entropy regularization.

Abstract

Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Software Engineering Research