Spatial-Language Attention Policies for Efficient Robot Learning
Priyam Parashar, Vidhi Jain, Xiaohan Zhang, Jay Vakil, Sam Powers,, Yonatan Bisk, Chris Paxton

TL;DR
This paper introduces Spatial-Language Attention Policies (SLAP), a novel approach for mobile manipulation that uses 3D tokens and language-conditioned policies to improve robustness and efficiency in real-world tasks, even with limited data.
Contribution
SLAP is the first method to effectively combine 3D spatial language tokens with multi-task learning for mobile robot manipulation, achieving high success rates with limited data.
Findings
80% success rate in real-world tasks with a single model
47.5% success rate with unseen clutter and configurations
4x improvement over baseline in mobile manipulation
Abstract
Despite great strides in language-guided manipulation, existing work has been constrained to table-top settings. Table-tops allow for perfect and consistent camera angles, properties are that do not hold in mobile manipulation. Task plans that involve moving around the environment must be robust to egocentric views and changes in the plane and angle of grasp. A further challenge is ensuring this is all true while still being able to learn skills efficiently from limited data. We propose Spatial-Language Attention Policies (SLAP) as a solution. SLAP uses three-dimensional tokens as the input representation to train a single multi-task, language-conditioned action prediction policy. Our method shows an 80% success rate in the real world across eight tasks with a single model, and a 47.5% success rate when unseen clutter and unseen object configurations are introduced, even with only a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
