Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
Zheyuan Liu, Weixuan Sun, Damien Teney, Stephen Gould

TL;DR
This paper introduces a two-stage re-ranking method for composed image retrieval that combines fast initial pruning with a dual-encoder re-ranking to improve accuracy on large-scale datasets.
Contribution
It proposes a novel two-stage framework that integrates fast candidate pruning with a dual-encoder re-ranking, leveraging vision-and-language pre-trained models for improved retrieval performance.
Findings
Outperforms state-of-the-art methods on standard benchmarks.
Effectively balances efficiency and discriminative power.
Demonstrates significant accuracy improvements in large-scale datasets.
Abstract
Composed image retrieval aims to find an image that best matches a given multi-modal user query consisting of a reference image and text pair. Existing methods commonly pre-compute image embeddings over the entire corpus and compare these to a reference image embedding modified by the query text at test time. Such a pipeline is very efficient at test time since fast vector distances can be used to evaluate candidates, but modifying the reference image embedding guided only by a short textual description can be difficult, especially independent of potential candidates. An alternative approach is to allow interactions between the query and every possible candidate, i.e., reference-text-candidate triplets, and pick the best from the entire set. Though this approach is more discriminative, for large-scale datasets the computational cost is prohibitive since pre-computation of candidate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsPruning · Test
