A Survey on Masked Autoencoder for Self-supervised Learning in Vision   and Beyond

Chaoning Zhang; Chenshuang Zhang; Junha Song; John Seon Keun Yi; Kang; Zhang; In So Kweon

arXiv:2208.00173·cs.CV·August 2, 2022·38 cites

A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

Chaoning Zhang, Chenshuang Zhang, Junha Song, John Seon Keun Yi, Kang, Zhang, In So Kweon

PDF

Open Access

TL;DR

This paper provides a comprehensive survey of masked autoencoders, highlighting their role in self-supervised learning for vision and their potential to bridge the gap with NLP methods like BERT.

Contribution

It is the first survey to review SSL with masked autoencoders in vision, covering historical development, recent progress, and future implications.

Findings

01

Masked autoencoders have revived interest in generative SSL in vision.

02

They show promise in bridging vision and NLP SSL techniques.

03

The survey discusses diverse applications and future directions.

Abstract

Masked autoencoders are scalable vision learners, as the title of MAE \cite{he2022masked}, which suggests that self-supervised learning (SSL) in vision might undertake a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction (e.g., BERT) have become a de facto standard SSL practice in NLP. By contrast, early attempts at generative methods in vision have been buried by their discriminative counterparts (like contrastive learning); however, the success of mask image modeling has revived the masking autoencoder (often termed denoising autoencoder in the past). As a milestone to bridge the gap with BERT in NLP, masked autoencoder has attracted unprecedented attention for SSL in vision and beyond. This work conducts a comprehensive survey of masked autoencoders to shed insight on a promising direction of SSL. As the first to review SSL with masked…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Masked autoencoder · Linear Layer · Layer Normalization · Adam · WordPiece · Weight Decay · Linear Warmup With Linear Decay · Residual Connection