Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
Ziyu Jiang, Yinpeng Chen, Mengchen Liu, Dongdong Chen, Xiyang Dai, Lu, Yuan, Zicheng Liu, Zhangyang Wang

TL;DR
This paper introduces Layer Grafted Pre-training, a novel method that sequentially combines Masked Image Modeling and Contrastive Learning across different network layers, resulting in more label-efficient visual representations.
Contribution
It proposes a layer-wise training paradigm that assigns MIM to lower layers and CL to higher layers, improving representation quality and few-shot learning performance.
Findings
Achieves 65.5% Top-1 accuracy on ImageNet-1k with 1% few-shot learning using ViT-B/16.
Outperforms MIM and CL baselines by 14.4% and 2.1%.
Demonstrates superior label efficiency and downstream task performance.
Abstract
Recently, both Contrastive Learning (CL) and Mask Image Modeling (MIM) demonstrate that self-supervision is powerful to learn good representations. However, naively combining them is far from success. In this paper, we start by making the empirical observation that a naive joint optimization of CL and MIM losses leads to conflicting gradient directions - more severe as the layers go deeper. This motivates us to shift the paradigm from combining loss at the end, to choosing the proper learning method per network layer. Inspired by experimental observations, we find that MIM and CL are suitable to lower and higher layers, respectively. We hence propose to combine them in a surprisingly simple, "sequential cascade" fashion: early layers are first trained under one MIM loss, on top of which latter layers continue to be trained under another CL loss. The proposed Layer Grafted Pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Infrastructure Maintenance and Monitoring · Advanced Neural Network Applications
MethodsMutual Information Machine/Mask Image Modeling · Contrastive Learning
