Learning Generalizable Feature Fields for Mobile Manipulation

Ri-Zhao Qiu; Yafei Hu; Yuchen Song; Ge Yang; Yang Fu; Jianglong Ye,; Jiteng Mu; Ruihan Yang; Nikolay Atanasov; Sebastian Scherer; Xiaolong Wang

arXiv:2403.07563·cs.RO·November 27, 2024·1 cites

Learning Generalizable Feature Fields for Mobile Manipulation

Ri-Zhao Qiu, Yafei Hu, Yuchen Song, Ge Yang, Yang Fu, Jianglong Ye,, Jiteng Mu, Ruihan Yang, Nikolay Atanasov, Sebastian Scherer, Xiaolong Wang

PDF

Open Access

TL;DR

This paper introduces GeFF, a neural scene representation that unifies navigation and manipulation tasks for mobile robots, leveraging generative view synthesis and CLIP-based semantic alignment for real-time, open-vocabulary scene understanding.

Contribution

The work presents GeFF, a novel scene-level neural feature field that combines geometry and semantics for mobile manipulation, enabling real-time, open-vocabulary tasks with improved efficiency.

Findings

01

Outperforms point-based baselines in runtime and storage-accuracy trade-offs.

02

Enables semantics-aware navigation and articulated object manipulation.

03

Demonstrates effectiveness on a quadrupedal robot with manipulation capabilities.

Abstract

An open problem in mobile manipulation is how to represent objects and scenes in a unified manner so that robots can use both for navigation and manipulation. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherent at an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We quantitatively evaluate GeFF's ability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Motion and Animation · Advanced Vision and Imaging

MethodsContrastive Language-Image Pre-training · ALIGN