Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
Youquan Liu, Lingdong Kong, Jun Cen, Runnan Chen, Wenwei, Zhang, Liang Pan, Kai Chen, Ziwei Liu

TL;DR
This paper introduces Seal, a framework that leverages vision foundation models to efficiently and consistently segment diverse automotive point cloud sequences without requiring annotations, demonstrating superior performance across multiple datasets.
Contribution
Seal is the first method to distill vision foundation models into point cloud segmentation, enabling scalable, consistent, and generalizable knowledge transfer for diverse datasets.
Findings
Achieves 45.0% mIoU on nuScenes, surpassing prior methods.
Outperforms existing methods in 20 few-shot fine-tuning tasks.
Demonstrates effectiveness across 11 diverse point cloud datasets.
Abstract
Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. Seal exhibits three appealing properties: i) Scalability: VFMs are directly distilled into point clouds, obviating the need for annotations in either 2D or 3D during pretraining. ii) Consistency: Spatial and temporal relationships are enforced at both the camera-to-LiDAR and point-to-segment regularization stages, facilitating cross-modal representation learning. iii) Generalizability: Seal enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets. Extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Robotics and Sensor-Based Localization
