InDiReCT: Language-Guided Zero-Shot Deep Metric Learning for Images
Konstantin Kobs, Michael Steininger, Andreas Hotho

TL;DR
InDiReCT introduces a zero-shot deep metric learning approach that uses natural language prompts to customize image similarity notions without training data, leveraging CLIP embeddings for flexible, application-specific image retrieval.
Contribution
The paper proposes InDiReCT, a novel method for language-guided zero-shot deep metric learning that enables customizable image similarity measures using only text prompts and CLIP embeddings.
Findings
InDiReCT outperforms strong baselines on five datasets.
It approaches the performance of fully-supervised models.
The method learns to focus on relevant image regions based on language cues.
Abstract
Common Deep Metric Learning (DML) datasets specify only one notion of similarity, e.g., two images in the Cars196 dataset are deemed similar if they show the same car model. We argue that depending on the application, users of image retrieval systems have different and changing similarity notions that should be incorporated as easily as possible. Therefore, we present Language-Guided Zero-Shot Deep Metric Learning (LanZ-DML) as a new DML setting in which users control the properties that should be important for image representations without training data by only using natural language. To this end, we propose InDiReCT (Image representations using Dimensionality Reduction on CLIP embedded Texts), a model for LanZ-DML on images that exclusively uses a few text prompts for training. InDiReCT utilizes CLIP as a fixed feature extractor for images and texts and transfers the variation in text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
InDiReCT: Language-Guided Zero-Shot Deep Metric Learning for Images· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Cancer-related molecular mechanisms research
MethodsContrastive Language-Image Pre-training
