Contrastive Language, Action, and State Pre-training for Robot Learning
Krishan Rana, Andrew Melnik, Niko S\"underhauf

TL;DR
This paper presents CLASP, a novel pre-training method that unifies language, action, and state representations in robotics, enabling improved zero-shot retrieval, captioning, and reinforcement learning tasks through distributional learning.
Contribution
It introduces a distributional pre-training approach extending CLIP to better capture complex language-behaviour relationships in robotics.
Findings
Superior zero-shot retrieval and captioning performance on unseen datasets
Effective generation of meaningful exploratory behaviours from textual commands
Potential for broad generalization to various downstream robot learning tasks
Abstract
In this paper, we introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning. Our method, Contrastive Language, Action, and State Pre-training (CLASP), extends the CLIP formulation by incorporating distributional learning, capturing the inherent complexities and one-to-many relationships in behaviour-text alignment. By employing distributional outputs for both text and behaviour encoders, our model effectively associates diverse textual commands with a single behaviour and vice-versa. We demonstrate the utility of our method for the following downstream tasks: zero-shot text-behaviour retrieval, captioning unseen robot behaviours, and learning a behaviour prior for language-conditioned reinforcement learning. Our distributional encoders exhibit superior retrieval and captioning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsContrastive Language-Image Pre-training
