SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians

Siyun Liang; Sen Wang; Kunyi Li; Michael Niemeyer; Stefano Gasperini; Hendrik P.A. Lensch; Nassir Navab; Federico Tombari

arXiv:2412.10231·cs.CV·January 21, 2026

SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians

Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Stefano Gasperini, Hendrik P.A. Lensch, Nassir Navab, Federico Tombari

PDF

Open Access

TL;DR

SuperGSeg introduces a hierarchical 3D scene representation using structured super-Gaussians, enabling efficient open-vocabulary segmentation and language feature integration with moderate memory use.

Contribution

It proposes a novel hierarchical scene representation with super-Gaussians that distill 2D language features into 3D, improving open-vocabulary segmentation efficiency.

Findings

01

Achieves state-of-the-art results in open-vocabulary object segmentation.

02

Demonstrates effective 3D scene understanding with moderate GPU memory.

03

Outperforms existing methods in semantic segmentation tasks.

Abstract

3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While its vanilla representation is mainly designed for view synthesis, recent works extended it to scene understanding with language features. However, storing additional high-dimensional features per Gaussian for semantic information is memory-intensive, which limits their ability to segment and interpret challenging scenes. To this end, we introduce SuperGSeg, a novel approach that fosters cohesive, context-aware hierarchical scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural 3D Gaussians to learn geometry, instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of \acrlong{superg}s. \acrlong{superg}s…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques

MethodsSparse Evolutionary Training