BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, \"Omer Erdin\c{c} Ya\u{g}murlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov

TL;DR
BEAST introduces a B-spline based action tokenizer that encodes action sequences into uniform, smooth, and efficient tokens, enabling faster and more reliable imitation learning across various models and tasks.
Contribution
BEAST is a novel action tokenizer that requires no separate training, produces uniform tokens, and ensures smooth trajectories, improving efficiency and performance in imitation learning.
Findings
Reduces training and inference computational costs.
Generates smooth, high-frequency control signals.
Achieves competitive success rates on benchmark tasks.
Abstract
We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-Language Model with an encoder-decoder architecture, demonstrating BEAST's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Reinforcement Learning in Robotics
MethodsLinear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Adam · Attention Is All You Need · Softmax · Label Smoothing · Multi-Head Attention · Dropout
