Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens
Joseph Clinton, Robert Lieck

TL;DR
The paper introduces Planning Tokens, a novel approach that incorporates high-level, long-term planning information into offline reinforcement learning models, significantly improving performance on long-horizon tasks and enhancing interpretability.
Contribution
It proposes a new architecture with Planning Tokens that encode long-term planning, addressing the limitations of auto-regressive models in long-horizon RL tasks.
Findings
Achieves state-of-the-art results on complex D4RL environments.
Improves interpretability through plan visualizations and attention maps.
Reduces compounding error in long-horizon predictions.
Abstract
Supervised learning approaches to offline reinforcement learning, particularly those utilizing the Decision Transformer, have shown effectiveness in continuous environments and for sparse rewards. However, they often struggle with long-horizon tasks due to the high compounding error of auto-regressive models. To overcome this limitation, we go beyond next-token prediction and introduce Planning Tokens, which contain high-level, long time-scale information about the agent's future. Predicting dual time-scale tokens at regular intervals enables our model to use these long-horizon Planning Tokens as a form of implicit planning to guide its low-level policy and reduce compounding error. This architectural modification significantly enhances performance on long-horizon tasks, establishing a new state-of-the-art in complex D4RL environments. Additionally, we demonstrate that Planning Tokens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Residual Connection · Linear Layer
