Open-NeRF: Towards Open Vocabulary NeRF Decomposition
Hao Zhang, Fang Li, and Narendra Ahuja

TL;DR
Open-NeRF introduces a method that combines large-scale segmentation models with hierarchical embeddings to enable accurate 3D object decomposition from open vocabulary queries, improving flexibility and precision in NeRF applications.
Contribution
The paper presents Open-NeRF, a novel approach that integrates foundation models with hierarchical embeddings for open-vocabulary NeRF decomposition, enhancing both flexibility and accuracy.
Findings
Outperforms state-of-the-art methods like LERF and FFD in open-vocabulary scenarios.
Achieves consistent object recognition across viewpoints, even with occlusions.
Enables applications in robotics and vision-language interaction in open-world 3D scenes.
Abstract
In this paper, we address the challenge of decomposing Neural Radiance Fields (NeRF) into objects from an open vocabulary, a critical task for object manipulation in 3D reconstruction and view synthesis. Current techniques for NeRF decomposition involve a trade-off between the flexibility of processing open-vocabulary queries and the accuracy of 3D segmentation. We present, Open-vocabulary Embedded Neural Radiance Fields (Open-NeRF), that leverage large-scale, off-the-shelf, segmentation models like the Segment Anything Model (SAM) and introduce an integrate-and-distill paradigm with hierarchical embeddings to achieve both the flexibility of open-vocabulary querying and 3D segmentation accuracy. Open-NeRF first utilizes large-scale foundation models to generate hierarchical 2D mask proposals from varying viewpoints. These proposals are then aligned via tracking approaches and integrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Open-NeRF: Towards Open Vocabulary NeRF Decomposition· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Multimodal Machine Learning Applications
