3D Point Cloud Pre-training with Knowledge Distillation from 2D Images

Yuan Yao; Yuanhan Zhang; Zhenfei Yin; Jiebo Luo; Wanli Ouyang,; Xiaoshui Huang

arXiv:2212.08974·cs.CV·December 20, 2022·6 cites

3D Point Cloud Pre-training with Knowledge Distillation from 2D Images

Yuan Yao, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang,, Xiaoshui Huang

PDF

Open Access

TL;DR

This paper introduces a knowledge distillation approach from 2D image models to 3D point cloud models, enhancing 3D pre-training by leveraging rich 2D semantic information to improve various downstream tasks.

Contribution

It proposes a novel cross-attention based knowledge distillation method from 2D image encoders to 3D point cloud models, addressing data limitations in 3D pre-training.

Findings

01

Achieves higher accuracy than state-of-the-art 3D pre-training methods.

02

Improves performance on object classification, detection, and segmentation tasks.

03

Effective on both synthetic and real-world datasets.

Abstract

The recent success of pre-trained 2D vision models is mostly attributable to learning from large-scale datasets. However, compared with 2D image datasets, the current pre-training data of 3D point cloud is limited. To overcome this limitation, we propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model, particularly the image encoder of CLIP, through concept alignment. Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images. In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models. Extensive experiments demonstrate that the proposed knowledge distillation scheme achieves higher accuracy than the state-of-the-art 3D pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization

MethodsKnowledge Distillation · Contrastive Language-Image Pre-training