CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification
Rabab Abdelfattah, Qing Guo, Xiaoguang Li, Xiaofeng Wang, and Song, Wang

TL;DR
This paper introduces CDUL, an unsupervised multi-label image classification method leveraging CLIP's global-local similarity aggregation to generate pseudo labels, leading to state-of-the-art results without annotations.
Contribution
It extends CLIP for multi-label prediction using global-local similarity aggregation and proposes an optimization framework for pseudo label refinement.
Findings
Outperforms existing unsupervised methods on multiple datasets.
Achieves results comparable to weakly supervised methods.
Effective pseudo label generation without annotations.
Abstract
This paper presents a CLIP-based unsupervised learning method for annotation-free multi-label image classification, including three stages: initialization, training, and inference. At the initialization stage, we take full advantage of the powerful CLIP model and propose a novel approach to extend CLIP for multi-label predictions based on global-local image-text similarity aggregation. To be more specific, we split each image into snippets and leverage CLIP to generate the similarity vector for the whole image (global) as well as each snippet (local). Then a similarity aggregator is introduced to leverage the global and local similarity vectors. Using the aggregated similarity scores as the initial pseudo labels at the training stage, we propose an optimization framework to train the parameters of the classification network and refine pseudo labels for unobserved labels. During…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification· youtube
Taxonomy
TopicsText and Document Classification Technologies · Image Retrieval and Classification Techniques · Machine Learning in Bioinformatics
MethodsContrastive Language-Image Pre-training
