GeneCIS: A Benchmark for General Conditional Image Similarity

Sagar Vaze; Nicolas Carion; Ishan Misra

arXiv:2306.07969·cs.CV·June 14, 2023·2 cites

GeneCIS: A Benchmark for General Conditional Image Similarity

Sagar Vaze, Nicolas Carion, Ishan Misra

PDF

Open Access

TL;DR

GeneCIS introduces a benchmark for evaluating models' ability to adapt to various notions of image similarity in a zero-shot setting, revealing limitations of current models and proposing a scalable improvement method.

Contribution

The paper presents GeneCIS, a new benchmark for dynamic similarity notions, and a scalable method to enhance zero-shot image similarity performance.

Findings

01

Powerful CLIP models perform poorly on GeneCIS.

02

Performance on GeneCIS is weakly correlated with ImageNet accuracy.

03

The proposed method improves zero-shot similarity and surpasses supervised models on MIT-States.

Abstract

We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while a user might prefer the model to focus on colors, textures or specific elements in the scene. In this paper, we propose the GeneCIS ('genesis') benchmark, which measures models' ability to adapt to a range of similarity conditions. Extending prior work, our benchmark is designed for zero-shot evaluation only, and hence considers an open-set of similarity conditions. We find that baselines from powerful CLIP models struggle on GeneCIS and that performance on the benchmark is only weakly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training · Focus