g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

Zihan Wang; Gim Hee Lee

arXiv:2411.17030·cs.CV·November 27, 2024

g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

Zihan Wang, Gim Hee Lee

PDF

Open Access 1 Repo

TL;DR

g3D-LF is a pre-trained 3D representation model that encodes multi-scale, multi-view features aligned with language, enabling improved embodied task performance in unseen environments.

Contribution

We propose g3D-LF, a novel 3D-language feature field model trained on large-scale data for generalizable embodied task applications.

Findings

01

Effective in Vision-and-Language Navigation tasks

02

Enables zero-shot object navigation

03

Improves situated question answering accuracy

Abstract

We introduce Generalizable 3D-Language Feature Fields (g3D-LF), a 3D representation model pre-trained on large-scale 3D-language dataset for embodied tasks. Our g3D-LF processes posed RGB-D images from agents to encode feature fields for: 1) Novel view representation predictions from any position in the 3D scene; 2) Generations of BEV maps centered on the agent; 3) Querying targets using multi-granularity language within the above-mentioned representations. Our representation can be generalized to unseen environments, enabling real-time construction and dynamic updates. By volume rendering latent features along sampled rays and integrating semantic and spatial relationships through multiscale encoders, our g3D-LF produces representations at different scales and perspectives, aligned with multi-granularity language, via multi-level contrastive learning. Furthermore, we prepare a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MrZihan/g3D-LF
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications

MethodsALIGN