Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking
Shitong Sun, Fanghua Ye, Shaogang Gong

TL;DR
This paper introduces a training-free zero-shot composed image retrieval method that translates queries into explicit text and uses local concept re-ranking, achieving competitive results without costly triplet training.
Contribution
The work proposes a novel training-free approach with explicit query translation and local concept re-ranking, improving efficiency and performance in zero-shot image retrieval.
Findings
Achieves comparable performance to state-of-the-art triplet training methods.
Significantly outperforms other training-free methods on multiple benchmarks.
Effective in open domain and fashion-specific datasets.
Abstract
Composed image retrieval attempts to retrieve an image of interest from gallery images through a composed query of a reference image and its corresponding modified text. It has recently attracted attention due to the collaboration of information-rich images and concise language to precisely express the requirements of target images. Most current composed image retrieval methods follow a supervised learning approach to training on a costly triplet dataset composed of a reference image, modified text, and a corresponding target image. To avoid difficult to-obtain labeled triplet training data, zero-shot composed image retrieval (ZS-CIR) has been introduced, which aims to retrieve the target image by learning from image-text pairs (self-supervised triplets), without the need for human-labeled triplets. However, this self-supervised triplet learning approach is computationally less…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
MethodsFocus
