Online Decision Transformer
Qinqing Zheng, Amy Zhang, Aditya Grover

TL;DR
Online Decision Transformers (ODT) integrate offline pretraining with online finetuning using sequence modeling and entropy regularization, leading to improved sample efficiency and performance in reinforcement learning tasks.
Contribution
The paper introduces ODT, a novel RL algorithm that unifies offline pretraining and online finetuning within a sequence modeling framework.
Findings
ODT achieves competitive performance on D4RL benchmark.
ODT shows significant gains during finetuning.
Sample-efficient exploration is enabled by entropy regularization.
Abstract
Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Software Engineering Research
