P3P: Pseudo-3D Pre-training for Scaling 3D Voxel-based Masked Autoencoders

Xuechao Chen; Ying Chen; Jialin Li; Qiang Nie; Hanqiu Deng; Yong Liu; Qixing Huang; Yang Li

arXiv:2408.10007·cs.CV·May 22, 2025

P3P: Pseudo-3D Pre-training for Scaling 3D Voxel-based Masked Autoencoders

Xuechao Chen, Ying Chen, Jialin Li, Qiang Nie, Hanqiu Deng, Yong Liu, Qixing Huang, Yang Li

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces P3P, a self-supervised pre-training framework that efficiently incorporates large-scale image data into 3D voxel-based models, improving 3D perception tasks like classification and segmentation.

Contribution

The paper proposes a novel linear-time tokenizer and a new 3D reconstruction target to enhance 3D pre-training with diverse and large-scale data.

Findings

01

Achieves state-of-the-art results in 3D classification

02

Improves performance in few-shot learning

03

Enhances 3D segmentation accuracy

Abstract

3D pre-training is crucial to 3D perception tasks. Nevertheless, limited by the difficulties in collecting clean and complete 3D data, 3D pre-training has persistently faced data scaling challenges. In this work, we introduce a novel self-supervised pre-training framework that incorporates millions of images into 3D pre-training corpora by leveraging a large depth estimation model. New pre-training corpora encounter new challenges in representation ability and embedding efficiency of models. Previous pre-training methods rely on farthest point sampling and k-nearest neighbors to embed a fixed number of 3D tokens. However, these approaches prove inadequate when it comes to embedding millions of samples that feature a diverse range of point numbers, spanning from 1,000 to 100,000. In contrast, we propose a tokenizer with linear-time complexity, which enables the efficient embedding of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuechaochen/p3p-mae
pytorchOfficial

Models

🤗
XuechaoChen/P3P-MAE
model

Datasets

XuechaoChen/P3P-Lift
dataset· 5 dl
5 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis