Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning
Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li,, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

TL;DR
This paper introduces Point-CMAE, a novel self-supervised learning method for 3D point clouds that combines masked autoencoders with contrastive learning to improve representation quality and transfer performance.
Contribution
It reintroduces contrastive learning into MAE-based point cloud pretraining by leveraging inherent contrastive properties, enhancing downstream task performance.
Findings
Point-CMAE outperforms existing MAE methods in various tasks.
The method improves transfer learning performance for 3D point cloud applications.
Explicit contrastive constraints within MAE enhance representation quality.
Abstract
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance. To address this limitation, we reintroduce CL into the MAE-based point cloud pre-training paradigm by leveraging the inherent contrastive properties of MAE. Specifically, rather than relying on extensive data augmentation as commonly used in the image domain, we randomly mask the input tokens twice to generate contrastive input pairs. Subsequently, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction
MethodsMasked autoencoder · Contrastive Learning
