TL;DR
QueST introduces a novel self-supervised learning framework that uses a quantized transformer to learn flexible, shareable skill abstractions from robot data, significantly improving multitask and few-shot control capabilities.
Contribution
The paper proposes QueST, a new architecture that enhances latent skill representations with causal bias, enabling better transfer and generalization in robot control tasks.
Findings
QueST outperforms state-of-the-art baselines on multitask benchmarks.
It learns more semantically meaningful and transferable skill representations.
QueST demonstrates strong few-shot learning performance.
Abstract
Generalization capabilities, or rather a lack thereof, is one of the most important unsolved problems in the field of robot learning, and while several large scale efforts have set out to tackle this problem, unsolved it remains. In this paper, we hypothesize that learning temporal action abstractions using latent variable models (LVMs), which learn to map data to a compressed latent space and back, is a promising direction towards low-level skills that can readily be used for new tasks. Although several works have attempted to show this, they have generally been limited by architectures that do not faithfully capture shareable representations. To address this we present Quantized Skill Transformer (QueST), which learns a larger and more flexible latent encoding that is more capable of modeling the breadth of low-level skills necessary for a variety of tasks. To make use of this extra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
