Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding
Wenbo Zhang, Lu Zhang, Ping Hu, Liqian Ma, Yunzhi Zhuge, Huchuan Lu

TL;DR
FreeGS introduces an unsupervised framework for view-consistent 3D scene understanding using Gaussian Splatting, avoiding reliance on 2D labels and complex data preparation, while achieving competitive results.
Contribution
The paper proposes FreeGS, a novel unsupervised 3D Gaussian Splatting method that embeds semantics without 2D supervision, enhancing view consistency and simplifying data requirements.
Findings
Performs comparably to state-of-the-art methods
Avoids complex 2D data preprocessing
Enables tasks like semantic segmentation and object detection
Abstract
Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view segmentation and semantic understanding, their heavy reliance on 2D supervision can undermine cross-view semantic consistency and necessitate complex data preparation processes, therefore hindering view-consistent scene understanding. In this work, we present FreeGS, an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels. Instead of directly learning semantic features, we introduce the IDentity-coupled Semantic Field (IDSF) into 3DGS, which captures both semantic representations and view-consistent instance indices for each Gaussian. We optimize IDSF with a two-step alternating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Advanced Vision and Imaging · 3D Surveying and Cultural Heritage
MethodsADaptive gradient method with the OPTimal convergence rate · Contrastive Language-Image Pre-training
