Emergent Agentic Transformer from Chain of Hindsight Experience
Hao Liu, Pieter Abbeel

TL;DR
This paper introduces the Agentic Transformer, a novel model trained with chain of hindsight experience relabeling, enabling it to learn from sub-optimal trials and improve performance on RL benchmarks.
Contribution
The paper presents the first transformer-based RL policy that can learn from multiple sub-optimal trajectories by relabeling experiences with maximum rewards, outperforming prior methods.
Findings
Performs competitively with TD and imitation learning methods on benchmarks.
Bigger models consistently yield better results.
First to use transformer with chain of hindsight relabeling in RL.
Abstract
Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas. In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal trials. In this work, we address this issue using recently proposed chain of hindsight to relabel experience, where we train a transformer on a sequence of trajectory experience ascending sorted according to their total rewards. Our method consists of relabelling target return of each trajectory to the maximum total reward among in sequence of trajectories and training an autoregressive model to predict actions conditioning on past states, actions, rewards, target returns, and task completion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Online Learning and Analytics · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · Test · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam
