PointVST: Self-Supervised Pre-training for 3D Point Clouds via   View-Specific Point-to-Image Translation

Qijian Zhang; Junhui Hou

arXiv:2212.14197·cs.CV·December 20, 2023·1 cites

PointVST: Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

Qijian Zhang, Junhui Hou

PDF

Open Access 1 Repo 1 Models

TL;DR

PointVST introduces a self-supervised pre-training method for 3D point clouds by translating them into 2D images, significantly improving downstream task performance and domain transfer capabilities.

Contribution

It proposes a novel cross-modal translation pretext task for 3D point cloud pre-training, bridging the gap between 3D and 2D representations.

Findings

01

Outperforms state-of-the-art methods on various tasks

02

Demonstrates strong domain transfer ability

03

Shows consistent performance improvements

Abstract

The past few years have witnessed the great success and prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the field of 3D point cloud learning. Different from existing pre-training paradigms designed for deep point cloud feature extractors that fall into the scope of generative modeling or contrastive learning, this paper proposes a translative pre-training framework, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images. More specifically, we begin with deducing view-conditioned point-wise embeddings through the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which can be further fed into subsequent 2D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

keeganhk/pointvst
pytorchOfficial

Models

🤗
tsphua/modernbert-fingpt
model· 250 dl
250 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition