3D Point Cloud Pre-training with Knowledge Distillation from 2D Images
Yuan Yao, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang,, Xiaoshui Huang

TL;DR
This paper introduces a knowledge distillation approach from 2D image models to 3D point cloud models, enhancing 3D pre-training by leveraging rich 2D semantic information to improve various downstream tasks.
Contribution
It proposes a novel cross-attention based knowledge distillation method from 2D image encoders to 3D point cloud models, addressing data limitations in 3D pre-training.
Findings
Achieves higher accuracy than state-of-the-art 3D pre-training methods.
Improves performance on object classification, detection, and segmentation tasks.
Effective on both synthetic and real-world datasets.
Abstract
The recent success of pre-trained 2D vision models is mostly attributable to learning from large-scale datasets. However, compared with 2D image datasets, the current pre-training data of 3D point cloud is limited. To overcome this limitation, we propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model, particularly the image encoder of CLIP, through concept alignment. Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images. In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models. Extensive experiments demonstrate that the proposed knowledge distillation scheme achieves higher accuracy than the state-of-the-art 3D pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
MethodsKnowledge Distillation · Contrastive Language-Image Pre-training
