CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

Jiange Yang; Sheng Guo; Gangshan Wu; Limin Wang

arXiv:2302.06148·cs.CV·February 14, 2023

CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

Jiange Yang, Sheng Guo, Gangshan Wu, Limin Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

CoMAE introduces a unified self-supervised pre-training framework for RGB and depth data that enhances scene recognition performance on small datasets by combining contrastive learning and masked autoencoding.

Contribution

This work proposes a novel single-model hybrid pre-training approach for RGB-D data, integrating contrastive learning and masked autoencoding with curriculum learning.

Findings

01

Effective on SUN RGB-D and NYUDv2 datasets

02

Data-efficient, performs well with small unlabeled datasets

03

Competitive with large-scale supervised pre-training

Abstract

Current RGB-D scene recognition approaches often train two standalone backbones for RGB and depth modalities with the same Places or ImageNet pre-training. However, the pre-trained depth network is still biased by RGB-based models which may result in a suboptimal solution. In this paper, we present a single-model self-supervised hybrid pre-training framework for RGB and depth modalities, termed as CoMAE. Our CoMAE presents a curriculum learning strategy to unify the two popular self-supervised representation learning algorithms: contrastive learning and masked image modeling. Specifically, we first build a patch-level alignment task to pre-train a single encoder shared by two modalities via cross-modal contrastive learning. Then, the pre-trained contrastive encoder is passed to a multi-modal masked autoencoder to capture the finer context features from a generative perspective. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcg-nju/comae
pytorchOfficial

Videos

CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Vision and Imaging · Image Processing Techniques and Applications

MethodsContrastive Learning