SdAE: Self-distillated Masked Autoencoder
Yabo Chen, Yuchen Liu, Dongsheng Jiang, Xiaopeng Zhang, Wenrui Dai,, Hongkai Xiong, Qi Tian

TL;DR
SdAE introduces a self-distillation masked autoencoder that enhances representation learning by combining a student encoder-decoder with a teacher producing latent representations, utilizing multi-fold masking for improved performance and efficiency.
Contribution
The paper proposes a novel self-distilled masked autoencoder architecture with multi-fold masking strategy, improving pre-training efficiency and downstream task performance.
Findings
Achieves 84.1% ImageNet accuracy after 300 epochs pre-training.
Surpasses other methods in segmentation and detection benchmarks.
Reduces computational complexity with multi-fold masking.
Abstract
With the development of generative-based self-supervised learning (SSL) approaches like BeiT and MAE, how to learn good representations by masking random patches of the input image and reconstructing the missing information has grown in concern. However, BeiT and PeCo need a "pre-pretraining" stage to produce discrete codebooks for masked patches representing. MAE does not require a pre-training codebook process, but setting pixels as reconstruction targets may introduce an optimization gap between pre-training and downstream tasks that good reconstruction quality may not always lead to the high descriptive capability for the model. Considering the above issues, in this paper, we propose a simple Self-distillated masked AutoEncoder network, namely SdAE. SdAE consists of a student branch using an encoder-decoder structure to reconstruct the missing information, and a teacher branch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Digital Media Forensic Detection · Domain Adaptation and Few-Shot Learning
MethodsMasked autoencoder · Stacked Denoising Autoencoder
