Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners
Bowen Shi, Xiaopeng Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni, Zou, Hongkai Xiong, Qi Tian

TL;DR
This paper introduces Hybrid Distillation, a method that combines contrastive learning and masked image modeling teachers to enhance feature discrimination and diversity in representation learning.
Contribution
It proposes a novel Hybrid Distillation strategy that jointly leverages CL and MIM teachers, improving model performance by balancing discrimination and diversity.
Findings
Hybrid Distill outperforms previous methods on multiple benchmarks.
The approach effectively balances global pattern recognition and local attention.
Progressive token masking reduces training costs and improves robustness.
Abstract
Representation learning has been evolving from traditional supervised training to Contrastive Learning (CL) and Masked Image Modeling (MIM). Previous works have demonstrated their pros and cons in specific scenarios, i.e., CL and supervised pre-training excel at capturing longer-range global patterns and enabling better feature discrimination, while MIM can introduce more local and diverse attention across all transformer layers. In this paper, we explore how to obtain a model that combines their strengths. We start by examining previous feature distillation and mask feature reconstruction methods and identify their limitations. We find that their increasing diversity mainly derives from the asymmetric designs, but these designs may in turn compromise the discrimination ability. In order to better obtain both discrimination and diversity, we propose a simple but effective Hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsContrastive Learning · Mutual Information Machine/Mask Image Modeling
