QuARI: Query Adaptive Retrieval Improvement
Eric Xing, Abby Stylianou, Robert Pless, Nathan Jacobs

TL;DR
This paper introduces QuARI, a method that learns query-specific linear transformations of vision-language model features to improve large-scale image retrieval performance efficiently.
Contribution
It proposes a novel approach to adapt features for each query via linear transformations, enhancing retrieval accuracy without significant computational overhead.
Findings
Outperforms state-of-the-art retrieval methods
Effective for large-scale image collections
Minimal additional computation at query time
Abstract
Massive-scale pretraining has made vision-language models increasingly popular for image-to-image and text-to-image retrieval across a broad collection of domains. However, these models do not perform well when used for challenging retrieval tasks, such as instance retrieval in very large-scale image collections. Recent work has shown that linear transformations of VLM features trained for instance retrieval can improve performance by emphasizing subspaces that relate to the domain of interest. In this paper, we explore a more extreme version of this specialization by learning to map a given query to a query-specific feature space transformation. Because this transformation is linear, it can be applied with minimal computational cost to millions of image embeddings, making it effective for large-scale retrieval or re-ranking. Results show that this method consistently outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
