CLIPort: What and Where Pathways for Robotic Manipulation

Mohit Shridhar; Lucas Manuelli; Dieter Fox

arXiv:2109.12098·cs.RO·September 27, 2021·99 cites

CLIPort: What and Where Pathways for Robotic Manipulation

Mohit Shridhar, Lucas Manuelli, Dieter Fox

PDF

Open Access 1 Repo

TL;DR

CLIPort is a robotic manipulation framework that combines semantic understanding from CLIP with spatial reasoning to perform diverse language-guided tasks efficiently in both simulated and real environments.

Contribution

It introduces a two-stream architecture integrating semantic and spatial pathways, enabling generalizable, language-conditioned manipulation without explicit pose or symbolic representations.

Findings

01

Effective in few-shot learning scenarios

02

Generalizes to unseen semantic concepts

03

Single multi-task policy performs comparably to multiple single-task policies

Abstract

How can we imbue robots with the ability to manipulate objects precisely but also to reason about them in terms of abstract concepts? Recent works in manipulation have shown that end-to-end networks can learn dexterous skills that require precise spatial reasoning, but these methods often fail to generalize to new goals or quickly learn transferable concepts across tasks. In parallel, there has been great progress in learning generalizable semantic representations for vision and language by training on large-scale internet data, however these representations lack the spatial understanding necessary for fine-grained manipulation. To this end, we propose a framework that combines the best of both worlds: a two-stream architecture with semantic and spatial pathways for vision-based manipulation. Specifically, we present CLIPort, a language-conditioned imitation-learning agent that combines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cliport/cliport
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsCLIPort · Contrastive Language-Image Pre-training