SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

Gang Li; Heliang Zheng; Daqing Liu; Chaoyue Wang; Bing Su; Changwen; Zheng

arXiv:2206.10207·cs.CV·October 6, 2022·49 cites

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

Gang Li, Heliang Zheng, Daqing Liu, Chaoyue Wang, Bing Su, Changwen, Zheng

PDF

Open Access 1 Repo 1 Video

TL;DR

SemMAE introduces a semantic-guided masking strategy for masked autoencoders that leverages semantic parts of images, leading to improved image representations and state-of-the-art performance on vision tasks.

Contribution

The paper proposes a novel semantic-guided masking approach for MAE that incorporates semantic parts, enhancing learning of intra-part patterns and inter-part relations.

Findings

01

Achieves 84.5% fine-tuning accuracy on ImageNet-1k, outperforming vanilla MAE.

02

Significantly improves performance on semantic segmentation tasks.

03

Yields state-of-the-art results in fine-grained recognition.

Abstract

Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps. 1) Semantic part learning: we design a self-supervised part learning method to obtain semantic parts by leveraging and refining the multi-head attention of a ViT-based encoder. 2) Semantic-guided MAE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucasligang/semmae
pytorchOfficial

Videos

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Neural Network Applications

MethodsMasked autoencoder · Softmax · Linear Layer