GeneCIS: A Benchmark for General Conditional Image Similarity
Sagar Vaze, Nicolas Carion, Ishan Misra

TL;DR
GeneCIS introduces a benchmark for evaluating models' ability to adapt to various notions of image similarity in a zero-shot setting, revealing limitations of current models and proposing a scalable improvement method.
Contribution
The paper presents GeneCIS, a new benchmark for dynamic similarity notions, and a scalable method to enhance zero-shot image similarity performance.
Findings
Powerful CLIP models perform poorly on GeneCIS.
Performance on GeneCIS is weakly correlated with ImageNet accuracy.
The proposed method improves zero-shot similarity and surpasses supervised models on MIT-States.
Abstract
We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while a user might prefer the model to focus on colors, textures or specific elements in the scene. In this paper, we propose the GeneCIS ('genesis') benchmark, which measures models' ability to adapt to a range of similarity conditions. Extending prior work, our benchmark is designed for zero-shot evaluation only, and hence considers an open-set of similarity conditions. We find that baselines from powerful CLIP models struggle on GeneCIS and that performance on the benchmark is only weakly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsContrastive Language-Image Pre-training · Focus
