Learning Generalizable Feature Fields for Mobile Manipulation
Ri-Zhao Qiu, Yafei Hu, Yuchen Song, Ge Yang, Yang Fu, Jianglong Ye,, Jiteng Mu, Ruihan Yang, Nikolay Atanasov, Sebastian Scherer, Xiaolong Wang

TL;DR
This paper introduces GeFF, a neural scene representation that unifies navigation and manipulation tasks for mobile robots, leveraging generative view synthesis and CLIP-based semantic alignment for real-time, open-vocabulary scene understanding.
Contribution
The work presents GeFF, a novel scene-level neural feature field that combines geometry and semantics for mobile manipulation, enabling real-time, open-vocabulary tasks with improved efficiency.
Findings
Outperforms point-based baselines in runtime and storage-accuracy trade-offs.
Enables semantics-aware navigation and articulated object manipulation.
Demonstrates effectiveness on a quadrupedal robot with manipulation capabilities.
Abstract
An open problem in mobile manipulation is how to represent objects and scenes in a unified manner so that robots can use both for navigation and manipulation. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherent at an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We quantitatively evaluate GeFF's ability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Motion and Animation · Advanced Vision and Imaging
MethodsContrastive Language-Image Pre-training · ALIGN
