MOMA:Distill from Self-Supervised Teachers

Yuchong Yao; Nandakishor Desai; Marimuthu Palaniswami

arXiv:2302.02089·cs.CV·February 7, 2023·1 cites

MOMA:Distill from Self-Supervised Teachers

Yuchong Yao, Nandakishor Desai, Marimuthu Palaniswami

PDF

Open Access

TL;DR

MOMA is a self-supervised distillation framework that combines knowledge from contrastive learning and masked image modeling to produce compact, high-performing models efficiently.

Contribution

It introduces a novel method to distill knowledge from pre-trained MoCo and MAE models into a single student, enhancing performance while reducing training costs.

Findings

01

MOMA achieves competitive results on various benchmarks.

02

The method reduces training epochs and computational costs.

03

It effectively combines two self-supervised paradigms for improved representations.

Abstract

Contrastive Learning and Masked Image Modelling have demonstrated exceptional performance on self-supervised representation learning, where Momentum Contrast (i.e., MoCo) and Masked AutoEncoder (i.e., MAE) are the state-of-the-art, respectively. In this work, we propose MOMA to distill from pre-trained MoCo and MAE in a self-supervised manner to collaborate the knowledge from both paradigms. We introduce three different mechanisms of knowledge transfer in the propsoed MOMA framework. : (1) Distill pre-trained MoCo to MAE. (2) Distill pre-trained MAE to MoCo (3) Distill pre-trained MoCo and MAE to a random initialized student. During the distillation, the teacher and the student are fed with original inputs and masked inputs, respectively. The learning is enabled by aligning the normalized representations from the teacher and the projected representations from the student. This simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsMasked autoencoder · InfoNCE · Batch Normalization · Momentum Contrast