Improvements to Self-Supervised Representation Learning for Masked Image Modeling
Jiawei Mao, Xuesong Yin, Yuanqi Chang, Honggu Zhou

TL;DR
This paper introduces Contrastive Masked AutoEncoders (CMAE), an improved masked image modeling approach that enhances representation learning by optimizing encoder-decoder design, incorporating specialized encoder tasks, and focusing on main objects through contrastive cropping.
Contribution
The paper proposes CMAE, a novel MIM method that improves encoder-decoder structure, adds dedicated encoder tasks, and uses contrastive cropping to better learn main object features.
Findings
Achieved 65.84% Top-1 accuracy on tinyimagenet with ViT-B.
Outperformed existing methods by +2.89% under same conditions.
Demonstrated the effectiveness of contrastive cropping in MIM.
Abstract
This paper explores improvements to the masked image modeling (MIM) paradigm. The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part. We found the following three main directions for MIM to be improved. First, since both encoders and decoders contribute to representation learning, MIM uses only encoders for downstream tasks, which ignores the impact of decoders on representation learning. Although the MIM paradigm already employs small decoders with asymmetric structures, we believe that continued reduction of decoder parameters is beneficial to improve the representational learning capability of the encoder . Second, MIM solves the image prediction task by training the encoder and decoder together , and does not design a separate task for the encoder . To further enhance the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · AI in cancer detection · Advanced Neural Network Applications
MethodsMasked autoencoder · Mutual Information Machine/Mask Image Modeling
