LERF: Language Embedded Radiance Fields
Justin Kerr, Chung Min Kim, Ken Goldberg, Angjoo Kanazawa, Matthew, Tancik

TL;DR
LERF integrates language embeddings into NeRF to enable real-time, open-ended, 3D language queries, facilitating interactive scene understanding without region proposals.
Contribution
This work introduces LERF, a novel method for embedding language into NeRF for zero-shot, multi-view consistent 3D language querying.
Findings
Supports real-time 3D relevancy maps for language prompts
Enables open-vocabulary, pixel-aligned 3D queries
Does not rely on region proposals or masks
Abstract
Humans describe the physical world using natural language to refer to specific 3D locations based on a vast range of properties: visual appearance, semantics, abstract associations, or actionable affordances. In this work we propose Language Embedded Radiance Fields (LERFs), a method for grounding language embeddings from off-the-shelf models like CLIP into NeRF, which enable these types of open-ended language queries in 3D. LERF learns a dense, multi-scale language field inside NeRF by volume rendering CLIP embeddings along training rays, supervising these embeddings across training views to provide multi-view consistency and smooth the underlying language field. After optimization, LERF can extract 3D relevancy maps for a broad range of language prompts interactively in real-time, which has potential use cases in robotics, understanding vision-language models, and interacting with 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
LERF: Language Embedded Radiance Fields· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsContrastive Language-Image Pre-training
