Towards Understanding Why Mask-Reconstruction Pretraining Helps in   Downstream Tasks

Jiachun Pan; Pan Zhou; Shuicheng Yan

arXiv:2206.03826·cs.LG·February 14, 2023·6 cites

Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

Jiachun Pan, Pan Zhou, Shuicheng Yan

PDF

Open Access

TL;DR

This paper provides a theoretical understanding of how mask-reconstruction pretraining (MRP) captures semantic features and why it enhances downstream task performance, supported by empirical validation.

Contribution

It offers a theoretical analysis demonstrating MRP's ability to learn all discriminative features and explains its superiority over supervised learning in downstream tasks.

Findings

01

MRP captures all discriminative features in pretraining.

02

Pretraining dataset diversity ensures feature retention during fine-tuning.

03

MRP outperforms supervised learning on classification tasks.

Abstract

For unsupervised pretraining, mask-reconstruction pretraining (MRP) approaches, e.g. MAE and data2vec, randomly mask input patches and then reconstruct the pixels or semantic features of these masked patches via an auto-encoder. Then for a downstream task, supervised fine-tuning the pretrained encoder remarkably surpasses the conventional ``supervised learning'' (SL) trained from scratch. However, it is still unclear 1) how MRP performs semantic feature learning in the pretraining phase and 2) why it helps in downstream tasks. To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative features of each potential semantic class in the pretraining dataset. Then considering the fact that the pretraining dataset is of huge size and high diversity and thus covers most features in downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Domain Adaptation and Few-Shot Learning

MethodsMasked autoencoder · Convolution