Planning Transformer: Long-Horizon Offline Reinforcement Learning with   Planning Tokens

Joseph Clinton; Robert Lieck

arXiv:2409.09513·cs.LG·September 17, 2024

Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens

Joseph Clinton, Robert Lieck

PDF

Open Access

TL;DR

The paper introduces Planning Tokens, a novel approach that incorporates high-level, long-term planning information into offline reinforcement learning models, significantly improving performance on long-horizon tasks and enhancing interpretability.

Contribution

It proposes a new architecture with Planning Tokens that encode long-term planning, addressing the limitations of auto-regressive models in long-horizon RL tasks.

Findings

01

Achieves state-of-the-art results on complex D4RL environments.

02

Improves interpretability through plan visualizations and attention maps.

03

Reduces compounding error in long-horizon predictions.

Abstract

Supervised learning approaches to offline reinforcement learning, particularly those utilizing the Decision Transformer, have shown effectiveness in continuous environments and for sparse rewards. However, they often struggle with long-horizon tasks due to the high compounding error of auto-regressive models. To overcome this limitation, we go beyond next-token prediction and introduce Planning Tokens, which contain high-level, long time-scale information about the agent's future. Predicting dual time-scale tokens at regular intervals enables our model to use these long-horizon Planning Tokens as a form of implicit planning to guide its low-level policy and reduce compounding error. This architectural modification significantly enhances performance on long-horizon tasks, establishing a new state-of-the-art in complex D4RL environments. Additionally, we demonstrate that Planning Tokens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Residual Connection · Linear Layer