Behavior Generation with Latent Actions
Seungjae Lee, Yibin Wang, Haritheja Etukuru, H. Jin Kim and, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

TL;DR
VQ-BeT introduces hierarchical vector quantization into behavior transformers, enabling scalable, multimodal action generation with faster inference across diverse environments, surpassing previous models like BeT and Diffusion Policies.
Contribution
The paper presents VQ-BeT, a novel hierarchical vector quantization approach that enhances behavior generation models by improving scalability, mode capturing, and inference speed.
Findings
VQ-BeT outperforms BeT and Diffusion Policies in multiple environments.
VQ-BeT captures behavior modes more effectively.
Inference speed is increased 5x over Diffusion Policies.
Abstract
Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models called Behavior Transformers (BeT) addresses this by discretizing actions using k-means clustering to capture different modes. However, k-means struggles to scale for high-dimensional action spaces or long sequences, and lacks gradient information, and thus BeT suffers in modeling long-range actions. In this work, we present Vector-Quantized Behavior Transformer (VQ-BeT), a versatile model for behavior generation that handles multimodal action prediction, conditional generation, and partial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗JayLee131/vqbet_pushtmodel· 6 dl· ♡ 26 dl♡ 2
- 🤗JayLee131/vqbet_pusht2model· 13 dl· ♡ 213 dl♡ 2
- 🤗lerobot/vqbet_pushtmodel· 7.1k dl· ♡ 47.1k dl♡ 4
- 🤗joaoocruz00/pi0_model_1model
- 🤗joaoocruz00/pi0_finetunedmodel
- 🤗neilslt/smolvla_coffee_cleanupmodel
- 🤗asyk454/smolvla_testFinetune_40kmodel
- 🤗asyk454/smolvla_throwawaymodel· 5 dl5 dl
- 🤗asyk454/svla_glasses_20kmodel· 2 dl2 dl
- 🤗Mohamedal/vq_franka_stack-bowlsmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Speech and dialogue systems
MethodsLinear Layer · Layer Normalization · Byte Pair Encoding · Dropout · Multi-Head Attention · Attention Is All You Need · Softmax · Dense Connections · Label Smoothing · Diffusion
