Behavior Transformers: Cloning $k$ modes with one stone
Nur Muhammad Mahi Shafiullah, Zichen Jeff Cui, Ariuntuya Altanzaya,, Lerrel Pinto

TL;DR
Behavior Transformer (BeT) is a novel method that models multi-modal, unlabeled demonstration data using transformer architectures, significantly improving performance in robotic manipulation and self-driving tasks by capturing diverse behaviors.
Contribution
BeT introduces a transformer-based approach with action discretization and multi-task action correction to effectively model multi-modal, unlabeled demonstration data in offline reinforcement learning.
Findings
BeT outperforms prior state-of-the-art methods on various robotic and self-driving datasets.
BeT effectively captures the major modes in demonstration datasets.
Extensive ablation studies highlight the importance of each component in BeT.
Abstract
While behavior learning has made impressive progress in recent times, it lags behind computer vision and natural language processing due to its inability to leverage large, human-generated datasets. Human behaviors have wide variance, multiple modes, and human demonstrations typically do not come with reward labels. These properties limit the applicability of current methods in Offline RL and Behavioral Cloning to learn from large, pre-collected datasets. In this work, we present Behavior Transformer (BeT), a new technique to model unlabeled demonstration data with multiple modes. BeT retrofits standard transformer architectures with action discretization coupled with a multi-task action correction inspired by offset prediction in object detection. This allows us to leverage the multi-modal modeling ability of modern transformers to predict multi-modal continuous actions. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Human Pose and Action Recognition
MethodsAttention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dropout · Multi-Head Attention · Byte Pair Encoding · Label Smoothing · Residual Connection
