CUS3D :CLIP-based Unsupervised 3D Segmentation via Object-level Denoise

Fuyang Yu; Runze Tian; Zhen Wang; Xiaochuan Wang; Xiaohui Liang

arXiv:2409.13982·cs.CV·October 14, 2024

CUS3D :CLIP-based Unsupervised 3D Segmentation via Object-level Denoise

Fuyang Yu, Runze Tian, Zhen Wang, Xiaochuan Wang, Xiaohui Liang

PDF

TL;DR

CUS3D introduces a novel unsupervised 3D segmentation framework that leverages CLIP's semantic knowledge, employing object-level denoising and multimodal distillation to improve accuracy and open-vocabulary capabilities.

Contribution

The paper presents a new distillation learning framework with an object-level denoising module for more accurate 3D features and better alignment with CLIP's semantic space.

Findings

01

Outperforms previous methods in unsupervised segmentation accuracy

02

Effective in open-vocabulary semantic segmentation tasks

03

Demonstrates robustness across diverse 3D datasets

Abstract

To ease the difficulty of acquiring annotation labels in 3D data, a common method is using unsupervised and open-vocabulary semantic segmentation, which leverage 2D CLIP semantic knowledge. In this paper, unlike previous research that ignores the ``noise'' raised during feature projection from 2D to 3D, we propose a novel distillation learning framework named CUS3D. In our approach, an object-level denosing projection module is designed to screen out the ``noise'' and ensure more accurate 3D feature. Based on the obtained features, a multimodal distillation learning module is designed to align the 3D feature with CLIP semantic feature space with object-centered constrains to achieve advanced unsupervised semantic segmentation. We conduct comprehensive experiments in both unsupervised and open-vocabulary segmentation, and the results consistently showcase the superiority of our model in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training · ALIGN