ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model
Yifan Chen, Xiaozhen Qiao, Zhe Sun, and Xuelong Li

TL;DR
This paper introduces ComKD-CLIP, a novel knowledge distillation approach that effectively transfers knowledge from large CLIP models to smaller ones, maintaining high performance while reducing computational requirements.
Contribution
The paper proposes a comprehensive distillation framework with two mechanisms, IFAlign and EduAttention, to improve smaller CLIP models' performance by mimicking large teacher models.
Findings
Outperforms existing methods on 11 datasets
Achieves comparable performance with significantly fewer parameters
Demonstrates effective knowledge transfer in multimodal tasks
Abstract
Contrastive Language-Image Pre-training (CLIP) models excel in integrating semantic information between images and text through contrastive learning techniques. It has achieved remarkable performance in various multimodal tasks. However, the deployment of large CLIP models is hindered in resource-limited environments, while smaller models frequently fail to meet the performance benchmarks required for practical applications. In this paper, we propose a novel approach, ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model, which aims to comprehensively distill the knowledge from a large teacher CLIP model into a smaller student model, ensuring comparable performance with significantly reduced parameters. ComKD-CLIP is composed of two key mechanisms: Image Feature Alignment (IFAlign) and Educational Attention (EduAttention). IFAlign makes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training · Knowledge Distillation · Contrastive Learning
