MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong, Jianmin Bao, Yinglin Zheng, Ting Zhang and, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan and, Dong Chen, Fang Wen, Nenghai Yu

TL;DR
MaskCLIP introduces masked self-distillation into contrastive language-image pretraining, enhancing local and global representation learning, leading to superior performance across multiple downstream tasks.
Contribution
It proposes a novel masked self-distillation method that improves local patch and semantic understanding in contrastive pretraining.
Findings
Achieves superior zero-shot and fine-tuning results.
Enhances local semantic representation learning.
Outperforms existing methods on various benchmarks.
Abstract
This paper presents a simple yet effective framework MaskCLIP, which incorporates a newly proposed masked self-distillation into contrastive language-image pretraining. The core idea of masked self-distillation is to distill representation from a full image to the representation predicted from a masked image. Such incorporation enjoys two vital benefits. First, masked self-distillation targets local patch representation learning, which is complementary to vision-language contrastive focusing on text-related representation. Second, masked self-distillation is also consistent with vision-language contrastive from the perspective of training objective as both utilize the visual encoder for feature aligning, and thus is able to learn local semantics getting indirect supervision from the language. We provide specially designed experiments with a comprehensive analysis to validate the two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
