Generalized Decision Transformer for Offline Hindsight Information Matching
Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu

TL;DR
This paper introduces the Generalized Decision Transformer (GDT), a versatile framework for offline hindsight information matching that enhances multi-task reinforcement learning by matching various future state statistics.
Contribution
The paper proposes GDT, a unified approach that generalizes Decision Transformer to match different future statistics, introducing novel CDT and BDT variants for improved offline multi-task learning.
Findings
CDT enables effective offline multi-task state-marginal matching.
BDT outperforms DT variants in offline multi-task imitation learning.
GDT expands the application of sequence modeling in reinforcement learning.
Abstract
How to extract as much learning signal from each trajectory data has been a key problem in reinforcement learning (RL), where sample inefficiency has posed serious challenges for practical applications. Recent works have shown that using expressive policy function approximators and conditioning on future trajectory information -- such as future states in hindsight experience replay or returns-to-go in Decision Transformer (DT) -- enables efficient learning of multi-task policies, where at times online RL is fully replaced by offline behavioral cloning, e.g. sequence modeling. We demonstrate that all these approaches are doing hindsight information matching (HIM) -- training policies that can output the rest of trajectory that matches some statistics of future state information. We present Generalized Decision Transformer (GDT) for solving any HIM problem, and show how different choices…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Adam · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Dense Connections · Softmax
