Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification
Manan Shah, Yash Bhalgat

TL;DR
This paper conducts a reproducibility study of CDUL, an unsupervised multi-label image classification method driven by CLIP, by providing open-source code and verifying the effectiveness of its key components.
Contribution
It offers a reproducible implementation and validation of the novel aggregation and training strategies introduced in the original CDUL paper.
Findings
Reproducible implementation of CDUL provided
Verification of CLIP-based pseudo label initialization effectiveness
Validation of gradient-alignment training method
Abstract
This report is a reproducibility study of the paper "CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification" (Abdelfattah et al, ICCV 2023). Our report makes the following contributions: (1) We provide a reproducible, well commented and open-sourced code implementation for the entire method specified in the original paper. (2) We try to verify the effectiveness of the novel aggregation strategy which uses the CLIP model to initialize the pseudo labels for the subsequent unsupervised multi-label image classification task. (3) We try to verify the effectiveness of the gradient-alignment training method specified in the original paper, which is used to update the network parameters and pseudo labels. The code can be found at https://github.com/cs-mshah/CDUL
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsContrastive Language-Image Pre-training
