Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
Myrna C. Silva, Mahtab Dahaghin, Matteo Toso, Alessio Del Bue

TL;DR
This paper presents Contrastive Gaussian Clustering, a novel method for 3D scene segmentation that leverages Gaussian modeling and contrastive learning to produce view-consistent masks with high accuracy.
Contribution
It introduces a new approach combining Gaussian modeling and contrastive learning for weakly supervised 3D scene segmentation and multi-view mask consistency.
Findings
Achieves +8% IoU accuracy over state-of-the-art methods.
Capable of generating view-consistent 3D segmentation masks.
Effective even with inconsistent 2D supervision masks.
Abstract
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Video Analysis and Summarization
MethodsContrastive Learning
