Understanding Masked Image Modeling via Learning Occlusion Invariant   Feature

Xiangwen Kong; Xiangyu Zhang

arXiv:2208.04164·cs.CV·August 9, 2022·1 cites

Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

Xiangwen Kong, Xiangyu Zhang

PDF

Open Access

TL;DR

This paper reveals that Masked Image Modeling (MIM) implicitly learns occlusion-invariant features, providing a new understanding of its success and unifying it with other siamese self-supervised learning methods.

Contribution

It introduces a new perspective that MIM learns occlusion-invariant features and unifies MIM with siamese approaches, clarifying the underlying mechanisms.

Findings

01

MIM can be interpreted as learning occlusion-invariant features.

02

The success of MIM is more related to learned features than similarity functions.

03

Occlusion-invariant features serve as a good initialization for vision transformers.

Abstract

Recently, Masked Image Modeling (MIM) achieves great success in self-supervised visual recognition. However, as a reconstruction-based framework, it is still an open question to understand how MIM works, since MIM appears very different from previous well-studied siamese approaches such as contrastive learning. In this paper, we propose a new viewpoint: MIM implicitly learns occlusion-invariant features, which is analogous to other siamese methods while the latter learns other invariance. By relaxing MIM formulation into an equivalent siamese form, MIM methods can be interpreted in a unified framework with conventional methods, among which only a) data transformations, i.e. what invariance to learn, and b) similarity measurements are different. Furthermore, taking MAE (He et al.) as a representative example of MIM, we empirically find the success of MIM models relates a little to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsMasked autoencoder · Mutual Information Machine/Mask Image Modeling