CLIP-driven Zero-shot Learning with Ambiguous Labels
Jinfu Fan, Jiangnan Li, Xiaowen Yan, Xiaohui Zhong, Wenpeng Lu, Linqing Huang

TL;DR
This paper introduces CLIP-PZSL, a framework that leverages CLIP to handle ambiguous labels in zero-shot learning, progressively refining label relevance and improving recognition of unseen classes.
Contribution
It proposes a novel CLIP-driven partial label ZSL method with semantic mining and a partial zero-shot loss to address label ambiguity in real-world scenarios.
Findings
Outperforms existing ZSL methods on multiple datasets
Effectively refines labels and improves semantic alignment over training
Demonstrates robustness to noisy and ambiguous labels
Abstract
Zero-shot learning (ZSL) aims to recognize unseen classes by leveraging semantic information from seen classes, but most existing methods assume accurate class labels for training instances. However, in real-world scenarios, noise and ambiguous labels can significantly reduce the performance of ZSL. To address this, we propose a new CLIP-driven partial label zero-shot learning (CLIP-PZSL) framework to handle label ambiguity. First, we use CLIP to extract instance and label features. Then, a semantic mining block fuses these features to extract discriminative label embeddings. We also introduce a partial zero-shot loss, which assigns weights to candidate labels based on their relevance to the instance and aligns instance and label embeddings to minimize semantic mismatch. As the training goes on, the ground-truth labels are progressively identified, and the refined labels and label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Text and Document Classification Technologies
