Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval
Mohammad Omama, Po-han Li, Sandeep P. Chinchali

TL;DR
This paper introduces AE-SVC and (SS)$_2$D, two novel methods that enhance the scalability and efficiency of image retrieval systems by improving foundation model embeddings and optimizing their size-performance trade-offs.
Contribution
It proposes AE-SVC for better foundation model embeddings and (SS)$_2$D for adaptive embedding sizes, addressing key challenges in scalable and efficient image retrieval.
Findings
AE-SVC improves retrieval performance by up to 16%.
(SS)$_2$D enhances performance by 10% for smaller embeddings.
Experiments conducted on four datasets with four foundation models.
Abstract
Image retrieval is crucial in robotics and computer vision, with downstream applications in robot place recognition and vision-based product recommendations. Modern retrieval systems face two key challenges: scalability and efficiency. State-of-the-art image retrieval systems train specific neural networks for each dataset, an approach that lacks scalability. Furthermore, since retrieval speed is directly proportional to embedding size, existing systems that use large embeddings lack efficiency. To tackle scalability, recent works propose using off-the-shelf foundation models. However, these models, though applicable across datasets, fall short in achieving performance comparable to that of dataset-specific models. Our key observation is that, while foundation models capture necessary subtleties for effective retrieval, the underlying distribution of their embedding space can negatively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Data Management and Algorithms
MethodsContrastive Language-Image Pre-training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
