Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Youquan Liu; Lingdong Kong; Jun Cen; Runnan Chen; Wenwei; Zhang; Liang Pan; Kai Chen; Ziwei Liu

arXiv:2306.09347·cs.CV·October 25, 2023·29 cites

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Youquan Liu, Lingdong Kong, Jun Cen, Runnan Chen, Wenwei, Zhang, Liang Pan, Kai Chen, Ziwei Liu

PDF

Open Access 2 Repos

TL;DR

This paper introduces Seal, a framework that leverages vision foundation models to efficiently and consistently segment diverse automotive point cloud sequences without requiring annotations, demonstrating superior performance across multiple datasets.

Contribution

Seal is the first method to distill vision foundation models into point cloud segmentation, enabling scalable, consistent, and generalizable knowledge transfer for diverse datasets.

Findings

01

Achieves 45.0% mIoU on nuScenes, surpassing prior methods.

02

Outperforms existing methods in 20 few-shot fine-tuning tasks.

03

Demonstrates effectiveness across 11 diverse point cloud datasets.

Abstract

Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. Seal exhibits three appealing properties: i) Scalability: VFMs are directly distilled into point clouds, obviating the need for annotations in either 2D or 3D during pretraining. ii) Consistency: Spatial and temporal relationships are enforced at both the camera-to-LiDAR and point-to-segment regularization stages, facilitating cross-modal representation learning. iii) Generalizability: Seal enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Robotics and Sensor-Based Localization