Generalized Decision Transformer for Offline Hindsight Information   Matching

Hiroki Furuta; Yutaka Matsuo; Shixiang Shane Gu

arXiv:2111.10364·cs.LG·February 7, 2022·5 cites

Generalized Decision Transformer for Offline Hindsight Information Matching

Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the Generalized Decision Transformer (GDT), a versatile framework for offline hindsight information matching that enhances multi-task reinforcement learning by matching various future state statistics.

Contribution

The paper proposes GDT, a unified approach that generalizes Decision Transformer to match different future statistics, introducing novel CDT and BDT variants for improved offline multi-task learning.

Findings

01

CDT enables effective offline multi-task state-marginal matching.

02

BDT outperforms DT variants in offline multi-task imitation learning.

03

GDT expands the application of sequence modeling in reinforcement learning.

Abstract

How to extract as much learning signal from each trajectory data has been a key problem in reinforcement learning (RL), where sample inefficiency has posed serious challenges for practical applications. Recent works have shown that using expressive policy function approximators and conditioning on future trajectory information -- such as future states in hindsight experience replay or returns-to-go in Decision Transformer (DT) -- enables efficient learning of multi-task policies, where at times online RL is fully replaced by offline behavioral cloning, e.g. sequence modeling. We demonstrate that all these approaches are doing hindsight information matching (HIM) -- training policies that can output the rest of trajectory that matches some statistics of future state information. We present Generalized Decision Transformer (GDT) for solving any HIM problem, and show how different choices…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frt03/generalized_dt
pytorchOfficial

Videos

Generalized Decision Transformer for Offline Hindsight Information Matching· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Adam · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Dense Connections · Softmax