Masked Modeling for Self-supervised Representation Learning on Vision   and Beyond

Siyuan Li; Luyuan Zhang; Zedong Wang; Di Wu; Lirong Wu; Zicheng Liu,; Jun Xia; Cheng Tan; Yang Liu; Baigui Sun; Stan Z. Li

arXiv:2401.00897·cs.CV·January 10, 2024·5 cites

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Siyuan Li, Luyuan Zhang, Zedong Wang, Di Wu, Lirong Wu, Zicheng Liu,, Jun Xia, Cheng Tan, Yang Liu, Baigui Sun, Stan Z. Li

PDF

Open Access 1 Repo

TL;DR

This paper provides a comprehensive review of masked modeling techniques in self-supervised learning, highlighting its methodologies, applications across domains, and future research directions.

Contribution

It systematically analyzes masked modeling frameworks, compares methods across fields, and discusses limitations and future prospects in self-supervised representation learning.

Findings

01

Masked modeling enhances robust representation learning.

02

It is effective across vision, language, and other modalities.

03

The survey identifies key challenges and future research directions.

Abstract

As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training. This paradigm enables deep models to learn robust representations and has demonstrated exceptional performance in the context of computer vision, natural language processing, and other modalities. In this survey, we present a comprehensive review of the masked modeling framework and its methodology. We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more. Then, we systematically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lupin1998/awesome-mim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition