Improving Zero-Shot Models with Label Distribution Priors
Jonathan Kahana, Niv Cohen, Yedid Hoshen

TL;DR
This paper introduces CLIPPR, a method that enhances zero-shot vision-language models like CLIP for regression and classification tasks by leveraging prior knowledge of label distributions, achieving significant accuracy improvements without labeled data.
Contribution
The paper proposes CLIPPR, a novel approach that adapts zero-shot models for regression and classification using label distribution priors, without requiring annotated images.
Findings
28% improvement in mean absolute error on UTK age regression
2.83% increase in classification accuracy on ImageNet
Effective adaptation of zero-shot models for supervised-like tasks
Abstract
Labeling large image datasets with attributes such as facial age or object type is tedious and sometimes infeasible. Supervised machine learning methods provide a highly accurate solution, but require manual labels which are often unavailable. Zero-shot models (e.g., CLIP) do not require manual labels but are not as accurate as supervised ones, particularly when the attribute is numeric. We propose a new approach, CLIPPR (CLIP with Priors), which adapts zero-shot models for regression and classification on unlabelled datasets. Our method does not use any annotated images. Instead, we assume a prior over the label distribution in the dataset. We then train an adapter network on top of CLIP under two competing objectives: i) minimal change of predictions from the original CLIP model ii) minimal distance between predicted and prior distribution of labels. Additionally, we present a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsAdapter · Contrastive Language-Image Pre-training
