Waypoint Transformer: Reinforcement Learning via Supervised Learning   with Intermediate Targets

Anirudhan Badrinath; Yannis Flet-Berliac; Allen Nie; Emma; Brunskill

arXiv:2306.14069·cs.LG·November 21, 2023·2 cites

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Anirudhan Badrinath, Yannis Flet-Berliac, Allen Nie, Emma, Brunskill

PDF

Open Access 1 Video

TL;DR

The paper introduces the Waypoint Transformer, an enhanced offline reinforcement learning method that incorporates intermediate targets to improve performance and stability across challenging environments.

Contribution

It presents the Waypoint Transformer architecture, building on decision transformers, with automatic waypoint conditioning to better connect suboptimal trajectories.

Findings

01

Significant increase in final return over existing RvS methods

02

Performance on par or better than state-of-the-art temporal difference methods

03

Largest improvements in challenging environments and data configurations

Abstract

Despite the recent advancements in offline reinforcement learning via supervised learning (RvS) and the success of the decision transformer (DT) architecture in various domains, DTs have fallen short in several challenging benchmarks. The root cause of this underperformance lies in their inability to seamlessly connect segments of suboptimal trajectories. To overcome this limitation, we present a novel approach to enhance RvS methods by integrating intermediate targets. We introduce the Waypoint Transformer (WT), using an architecture that builds upon the DT framework and conditioned on automatically-generated waypoints. The results show a significant increase in the final return compared to existing RvS methods, with performance on par or greater than existing state-of-the-art temporal difference learning-based methods. Additionally, the performance and stability improvements are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications

MethodsAttention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing