LERF: Language Embedded Radiance Fields

Justin Kerr; Chung Min Kim; Ken Goldberg; Angjoo Kanazawa; Matthew; Tancik

arXiv:2303.09553·cs.CV·March 17, 2023·5 cites

LERF: Language Embedded Radiance Fields

Justin Kerr, Chung Min Kim, Ken Goldberg, Angjoo Kanazawa, Matthew, Tancik

PDF

Open Access 5 Repos 1 Datasets 1 Video

TL;DR

LERF integrates language embeddings into NeRF to enable real-time, open-ended, 3D language queries, facilitating interactive scene understanding without region proposals.

Contribution

This work introduces LERF, a novel method for embedding language into NeRF for zero-shot, multi-view consistent 3D language querying.

Findings

01

Supports real-time 3D relevancy maps for language prompts

02

Enables open-vocabulary, pixel-aligned 3D queries

03

Does not rely on region proposals or masks

Abstract

Humans describe the physical world using natural language to refer to specific 3D locations based on a vast range of properties: visual appearance, semantics, abstract associations, or actionable affordances. In this work we propose Language Embedded Radiance Fields (LERFs), a method for grounding language embeddings from off-the-shelf models like CLIP into NeRF, which enable these types of open-ended language queries in 3D. LERF learns a dense, multi-scale language field inside NeRF by volume rendering CLIP embeddings along training rays, supervising these embeddings across training views to provide multi-view consistency and smooth the underlying language field. After optimization, LERF can extract 3D relevancy maps for a broad range of language prompts interactively in real-time, which has potential use cases in robotics, understanding vision-language models, and interacting with 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

joshir/3D-Scene-Segmentation-HQ
dataset· 13 dl
13 dl

Videos

LERF: Language Embedded Radiance Fields· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications

MethodsContrastive Language-Image Pre-training