Cues3D: Unleashing the Power of Sole NeRF for Consistent and Unique Instances in Open-Vocabulary 3D Panoptic Segmentation
Feng Xue, Wenzhuang Xu, Guofeng Zhong, Anlong Minga, Nicu Sebe

TL;DR
Cues3D introduces a NeRF-based framework for open-vocabulary 3D panoptic segmentation that achieves consistent and unique object instance identification across views without relying on pre-associations.
Contribution
The paper presents a novel NeRF-only approach with a three-phase training framework and instance disambiguation for globally consistent 3D instance IDs, surpassing existing methods.
Findings
Outperforms 2D image-based methods on multiple datasets.
Achieves high consistency and uniqueness in 3D instance IDs.
Surpasses some 2D-3D merging methods, especially with additional 3D data.
Abstract
Open-vocabulary 3D panoptic segmentation has recently emerged as a significant trend. Top-performing methods currently integrate 2D segmentation with geometry-aware 3D primitives. However, the advantage would be lost without high-fidelity 3D point clouds, such as methods based on Neural Radiance Field (NeRF). These methods are limited by the insufficient capacity to maintain consistency across partial observations. To address this, recent works have utilized contrastive loss or cross-view association pre-processing for view consensus. In contrast to them, we present Cues3D, a compact approach that relies solely on NeRF instead of pre-associations. The core idea is that NeRF's implicit 3D field inherently establishes a globally consistent geometry, enabling effective object distinction without explicit cross-view supervision. We propose a three-phase training framework for NeRF,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
