Spatial-Language Attention Policies for Efficient Robot Learning

Priyam Parashar; Vidhi Jain; Xiaohan Zhang; Jay Vakil; Sam Powers,; Yonatan Bisk; Chris Paxton

arXiv:2304.11235·cs.RO·November 8, 2023·1 cites

Spatial-Language Attention Policies for Efficient Robot Learning

Priyam Parashar, Vidhi Jain, Xiaohan Zhang, Jay Vakil, Sam Powers,, Yonatan Bisk, Chris Paxton

PDF

Open Access

TL;DR

This paper introduces Spatial-Language Attention Policies (SLAP), a novel approach for mobile manipulation that uses 3D tokens and language-conditioned policies to improve robustness and efficiency in real-world tasks, even with limited data.

Contribution

SLAP is the first method to effectively combine 3D spatial language tokens with multi-task learning for mobile robot manipulation, achieving high success rates with limited data.

Findings

01

80% success rate in real-world tasks with a single model

02

47.5% success rate with unseen clutter and configurations

03

4x improvement over baseline in mobile manipulation

Abstract

Despite great strides in language-guided manipulation, existing work has been constrained to table-top settings. Table-tops allow for perfect and consistent camera angles, properties are that do not hold in mobile manipulation. Task plans that involve moving around the environment must be robust to egocentric views and changes in the plane and angle of grasp. A further challenge is ensuring this is all true while still being able to learn skills efficiently from limited data. We propose Spatial-Language Attention Policies (SLAP) as a solution. SLAP uses three-dimensional tokens as the input representation to train a single multi-task, language-conditioned action prediction policy. Our method shows an 80% success rate in the real world across eight tasks with a single model, and a 47.5% success rate when unseen clutter and unseen object configurations are introduced, even with only a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning