ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive   Language-Image Pre-traning Model

Yifan Chen; Xiaozhen Qiao; Zhe Sun; and Xuelong Li

arXiv:2408.04145·cs.CV·August 22, 2024

ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model

Yifan Chen, Xiaozhen Qiao, Zhe Sun, and Xuelong Li

PDF

Open Access

TL;DR

This paper introduces ComKD-CLIP, a novel knowledge distillation approach that effectively transfers knowledge from large CLIP models to smaller ones, maintaining high performance while reducing computational requirements.

Contribution

The paper proposes a comprehensive distillation framework with two mechanisms, IFAlign and EduAttention, to improve smaller CLIP models' performance by mimicking large teacher models.

Findings

01

Outperforms existing methods on 11 datasets

02

Achieves comparable performance with significantly fewer parameters

03

Demonstrates effective knowledge transfer in multimodal tasks

Abstract

Contrastive Language-Image Pre-training (CLIP) models excel in integrating semantic information between images and text through contrastive learning techniques. It has achieved remarkable performance in various multimodal tasks. However, the deployment of large CLIP models is hindered in resource-limited environments, while smaller models frequently fail to meet the performance benchmarks required for practical applications. In this paper, we propose a novel approach, ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model, which aims to comprehensively distill the knowledge from a large teacher CLIP model into a smaller student model, ensuring comparable performance with significantly reduced parameters. ComKD-CLIP is composed of two key mechanisms: Image Feature Alignment (IFAlign) and Educational Attention (EduAttention). IFAlign makes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training · Knowledge Distillation · Contrastive Learning