Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences
Longlong Jing, Yucheng Chen, Ling Zhang, Mingyi He, Yingli Tian

TL;DR
This paper introduces a self-supervised learning method that jointly learns 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences, eliminating the need for human labels.
Contribution
It presents a novel approach to jointly learn 2D and 3D features using cross-modality and cross-view supervision without human annotations.
Findings
Effective transfer to multiple 2D and 3D shape tasks
Strong generalization across datasets
Outperforms existing self-supervised methods
Abstract
The success of supervised learning requires large-scale ground truth labels which are very expensive, time-consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike most existing self-supervised methods to learn only 2D image features or only 3D point cloud features, this paper presents a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences without using any human annotated labels. Specifically, 2D image features of rendered images from different views are extracted by a 2D convolutional neural network, and 3D point cloud features are extracted by a graph convolution neural network. Two types of features are fed into a two-layer fully connected neural network to estimate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Human Pose and Action Recognition
MethodsConvolution
