MoDE: CLIP Data Experts via Clustering
Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer,, Shih-Fu Chang, Wen-Tau Yih, Hu Xu

TL;DR
MoDE introduces a clustering-based approach to create specialized CLIP data experts, improving zero-shot image classification performance while reducing training costs and enabling flexible, asynchronous training of data experts.
Contribution
The paper proposes MoDE, a novel clustering-based method to train multiple CLIP data experts, enhancing robustness to noisy data and reducing training costs compared to standard CLIP models.
Findings
Four CLIP data experts outperform larger models on zero-shot classification.
MoDE reduces training cost by less than 35%.
The system supports asynchronous training and easy inclusion of new data experts.
Abstract
The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications
MethodsContrastive Language-Image Pre-training · Ontology
