Understanding Self-Supervised Pretraining with Part-Aware Representation Learning
Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang, Wang, Wenyu Liu, Leye Wang, Jingdong Wang

TL;DR
This paper investigates how self-supervised pretraining methods learn part-aware representations, revealing their strengths in part-level recognition and the complementary nature of contrastive learning and masked image modeling.
Contribution
It provides a theoretical explanation of part-to-whole and part-to-part learning in self-supervised methods and empirically compares their effectiveness on recognition tasks.
Findings
Self-supervised models excel at part-level recognition.
Contrastive learning and masked image modeling are complementary.
Fully-supervised models outperform self-supervised ones at object-level recognition.
Abstract
In this paper, we are interested in understanding self-supervised pretraining through studying the capability that self-supervised representation pretraining methods learn part-aware representations. The study is mainly motivated by that random views, used in contrastive learning, and random masked (visible) patches, used in masked image modeling, are often about object parts. We explain that contrastive learning is a part-to-whole task: the projection layer hallucinates the whole object representation from the object part representation learned from the encoder, and that masked image modeling is a part-to-part task: the masked patches of the object are hallucinated from the visible patches. The explanation suggests that the self-supervised pretrained encoder is required to understand the object part. We empirically compare the off-the-shelf encoders pretrained with several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · 3D Surveying and Cultural Heritage
MethodsContrastive Learning
