Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

Hyunjee Lee; Youngsik Yun; Jeongmin Bae; Seoha Kim; Youngjung Uh

arXiv:2408.07416·cs.CV·February 24, 2025

Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

Hyunjee Lee, Youngsik Yun, Jeongmin Bae, Seoha Kim, Youngjung Uh

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel approach for 3D semantic segmentation of radiance fields, enabling complete 3D understanding and real-time rendering, surpassing previous 2D mask-based methods.

Contribution

It redefines 3D segmentation in radiance fields, directly supervises 3D points, and achieves real-time rendering with improved semantic understanding.

Findings

01

First real-time 3D semantic rendering of radiance fields.

02

Supervised 3D points improve segmentation accuracy.

03

New protocol for joint geometry and semantics evaluation.

Abstract

Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are rendered as 2D masks that do not represent the entire 3D space. To address this limitation, we redefine the problem to segment the 3D volume and propose the following methods for better 3D understanding. We directly supervise the 3D points to train the language embedding field, unlike previous methods that anchor supervision at 2D pixels. We transfer the learned language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. Lastly, we introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space· underline

Taxonomy

Topics3D Surveying and Cultural Heritage

MethodsSparse Evolutionary Training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings