Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Jon Almaz\'an, Byungsoo Ko, Geonmo Gu, Diane Larlus, Yannis Kalantidis

TL;DR
This paper introduces Grappa, an unsupervised method that adapts a pretrained image retrieval model to multiple tasks simultaneously by using pseudo-labels of varying granularity, improving zero-shot performance across diverse domains.
Contribution
It proposes a novel multi-granularity adaptation approach that leverages pseudo-labels and fusion layers to unify multiple retrieval tasks into a single versatile model.
Findings
Improves zero-shot performance on six heterogeneous retrieval tasks.
Reaches or surpasses oracle performance in some cases.
Effectively utilizes unlabeled images for multi-task adaptation.
Abstract
Strong image search models can be learned for a specific domain, ie. set of labels, provided that some labeled images of that domain are available. A practical visual search model, however, should be versatile enough to solve multiple retrieval tasks simultaneously, even if those cover very different specialized domains. Additionally, it should be able to benefit from even unlabeled images from these various retrieval tasks. This is the more practical scenario that we consider in this paper. We address it with the proposed Grappa, an approach that starts from a strong pretrained model, and adapts it to tackle multiple retrieval tasks concurrently, using only unlabeled images from the different task domains. We extend the pretrained model with multiple independently trained sets of adaptors that use pseudo-label sets of different sizes, effectively mimicking different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
