Emergent Agentic Transformer from Chain of Hindsight Experience

Hao Liu; Pieter Abbeel

arXiv:2305.16554·cs.LG·May 29, 2023·1 cites

Emergent Agentic Transformer from Chain of Hindsight Experience

Hao Liu, Pieter Abbeel

PDF

Open Access

TL;DR

This paper introduces the Agentic Transformer, a novel model trained with chain of hindsight experience relabeling, enabling it to learn from sub-optimal trials and improve performance on RL benchmarks.

Contribution

The paper presents the first transformer-based RL policy that can learn from multiple sub-optimal trajectories by relabeling experiences with maximum rewards, outperforming prior methods.

Findings

01

Performs competitively with TD and imitation learning methods on benchmarks.

02

Bigger models consistently yield better results.

03

First to use transformer with chain of hindsight relabeling in RL.

Abstract

Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas. In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal trials. In this work, we address this issue using recently proposed chain of hindsight to relabel experience, where we train a transformer on a sequence of trajectory experience ascending sorted according to their total rewards. Our method consists of relabelling target return of each trajectory to the maximum total reward among in sequence of trajectories and training an autoregressive model to predict actions conditioning on past states, actions, rewards, target returns, and task completion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Online Learning and Analytics · Explainable Artificial Intelligence (XAI)

MethodsMulti-Head Attention · Attention Is All You Need · Test · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam