CLIP-CID: Efficient CLIP Distillation via Cluster-Instance   Discrimination

Kaicheng Yang; Tiancheng Gu; Xiang An; Haiqiang Jiang; Xiangzi Dai,; Ziyong Feng; Weidong Cai; Jiankang Deng

arXiv:2408.09441·cs.CV·December 17, 2024

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Kaicheng Yang, Tiancheng Gu, Xiang An, Haiqiang Jiang, Xiangzi Dai,, Ziyong Feng, Weidong Cai, Jiankang Deng

PDF

Open Access 1 Video

TL;DR

CLIP-CID introduces an efficient distillation method that reduces data bias and leverages cluster-instance discrimination to transfer knowledge from large CLIP models to smaller ones, achieving state-of-the-art results.

Contribution

The paper presents a novel distillation mechanism combining image semantic balancing and cluster-instance discrimination for vision-language models.

Findings

01

Reduces training data by 43.7% while maintaining performance.

02

Achieves state-of-the-art results on downstream tasks.

03

Enhances semantic understanding in smaller models.

Abstract

Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over a wide range of tasks. However, the effectiveness of CLIP heavily relies on a substantial corpus of pre-training data, resulting in notable consumption of computational resources. Although knowledge distillation has been widely applied in single modality models, how to efficiently expand knowledge distillation to vision-language foundation models with extensive data remains relatively unexplored. In this paper, we introduce CLIP-CID, a novel distillation mechanism that effectively transfers knowledge from a large vision-language foundation model to a smaller model. We initially propose a simple but efficient image semantic balance method to reduce transfer learning bias and improve distillation efficiency. This method filters out 43.7% of image-text pairs from the LAION400M while maintaining superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination· underline

Taxonomy

TopicsAdvanced Control Systems Optimization · Chemical Synthesis and Reactions · Phytochemical Studies and Bioactivities

MethodsKnowledge Distillation · Contrastive Language-Image Pre-training