CRoF: CLIP-based Robust Few-shot Learning on Noisy Labels
Shizhuo Deng, Bowen Han, Jiaqi Chen, Hao Wang, Dongyue Chen, Tong Jia

TL;DR
This paper introduces CRoF, a plug-in module for CLIP that improves few-shot learning robustness on noisy labels by using discriminative prompts and a weighted fine-tuning strategy, outperforming existing methods.
Contribution
CRoF is a novel plug-in that enhances CLIP's domain generalization on noisy data through task-oriented prompts and a weighted multi-label loss.
Findings
CRoF outperforms fine-tuned and vanilla CLIP on various noise types.
Discriminative prompts increase inter-class textual embedding distances.
Weighted multi-label loss improves robustness under noisy labels.
Abstract
Noisy labels threaten the robustness of few-shot learning (FSL) due to the inexact features in a new domain. CLIP, a large-scale vision-language model, performs well in FSL on image-text embedding similarities, but it is susceptible to misclassification caused by noisy labels. How to enhance domain generalization of CLIP on noisy data within FSL tasks is a critical challenge. In this paper, we provide a novel view to mitigate the influence of noisy labels, CLIP-based Robust Few-shot learning (CRoF). CRoF is a general plug-in module for CLIP-based models. To avoid misclassification and confused label embedding, we design the few-shot task-oriented prompt generator to give more discriminative descriptions of each category. The proposed prompt achieves larger distances of inter-class textual embedding. Furthermore, rather than fully trusting zero-shot classification by CLIP, we fine-tune…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI and Multimedia in Education · Educational Technology and Assessment · Ideological and Political Education
MethodsContrastive Language-Image Pre-training
