Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked   Auto-Encoder

Huiwon Jang; Jihoon Tack; Daewon Choi; Jongheon Jeong; Jinwoo Shin

arXiv:2310.16318·cs.LG·October 26, 2023·1 cites

Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder

Huiwon Jang, Jihoon Tack, Daewon Choi, Jongheon Jeong, Jinwoo Shin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MetaMAE, a modality-agnostic self-supervised learning framework that leverages meta-learning to improve masked auto-encoder performance across diverse data modalities.

Contribution

It develops a unified MAE framework enhanced with meta-learning techniques, enabling effective SSL across multiple modalities, which is a novel approach in the field.

Findings

01

MetaMAE significantly outperforms prior baselines on the DABS benchmark.

02

The integration of meta-learning improves the reconstruction quality in SSL.

03

MetaMAE demonstrates strong generalization across different data modalities.

Abstract

Despite its practical importance across a wide range of modalities, recent advances in self-supervised learning (SSL) have been primarily focused on a few well-curated domains, e.g., vision and language, often relying on their domain-specific knowledge. For example, Masked Auto-Encoder (MAE) has become one of the popular architectures in these domains, but less has explored its potential in other modalities. In this paper, we develop MAE as a unified, modality-agnostic SSL framework. In turn, we argue meta-learning as a key to interpreting MAE as a modality-agnostic learner, and propose enhancements to MAE from the motivation to jointly improve its SSL across diverse modalities, coined MetaMAE as a result. Our key idea is to view the mask reconstruction of MAE as a meta-learning task: masked tokens are predicted by adapting the Transformer meta-learner through the amortization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alinlab/metamae
pytorch

Videos

Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Contrastive Learning · Absolute Position Encodings