Bringing Masked Autoencoders Explicit Contrastive Properties for Point   Cloud Self-Supervised Learning

Bin Ren; Guofeng Mei; Danda Pani Paudel; Weijie Wang; Yawei Li,; Mengyuan Liu; Rita Cucchiara; Luc Van Gool; Nicu Sebe

arXiv:2407.05862·cs.CV·July 9, 2024·1 cites

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li,, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

PDF

Open Access 1 Repo

TL;DR

This paper introduces Point-CMAE, a novel self-supervised learning method for 3D point clouds that combines masked autoencoders with contrastive learning to improve representation quality and transfer performance.

Contribution

It reintroduces contrastive learning into MAE-based point cloud pretraining by leveraging inherent contrastive properties, enhancing downstream task performance.

Findings

01

Point-CMAE outperforms existing MAE methods in various tasks.

02

The method improves transfer learning performance for 3D point cloud applications.

03

Explicit contrastive constraints within MAE enhance representation quality.

Abstract

Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance. To address this limitation, we reintroduce CL into the MAE-based point cloud pre-training paradigm by leveraging the inherent contrastive properties of MAE. Specifically, rather than relying on extensive data augmentation as commonly used in the image domain, we randomly mask the input tokens twice to generate contrastive input pairs. Subsequently, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazingren/point-cmae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction

MethodsMasked autoencoder · Contrastive Learning