SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians
Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Stefano Gasperini, Hendrik P.A. Lensch, Nassir Navab, Federico Tombari

TL;DR
SuperGSeg introduces a hierarchical 3D scene representation using structured super-Gaussians, enabling efficient open-vocabulary segmentation and language feature integration with moderate memory use.
Contribution
It proposes a novel hierarchical scene representation with super-Gaussians that distill 2D language features into 3D, improving open-vocabulary segmentation efficiency.
Findings
Achieves state-of-the-art results in open-vocabulary object segmentation.
Demonstrates effective 3D scene understanding with moderate GPU memory.
Outperforms existing methods in semantic segmentation tasks.
Abstract
3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While its vanilla representation is mainly designed for view synthesis, recent works extended it to scene understanding with language features. However, storing additional high-dimensional features per Gaussian for semantic information is memory-intensive, which limits their ability to segment and interpret challenging scenes. To this end, we introduce SuperGSeg, a novel approach that fosters cohesive, context-aware hierarchical scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural 3D Gaussians to learn geometry, instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of \acrlong{superg}s. \acrlong{superg}s…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
MethodsSparse Evolutionary Training
