Instance-Level Composed Image Retrieval
Bill Psomas, George Retsinas, Nikos Efthymiadis, Panagiotis Filntisis, Yannis Avrithis, Petros Maragos, Ondrej Chum, Giorgos Tolias

TL;DR
This paper introduces i-CIR, a new dataset for instance-level composed image retrieval, and proposes BASIC, a training-free method leveraging pre-trained models to improve retrieval accuracy, setting new state-of-the-art results.
Contribution
The paper presents a novel instance-level dataset for composed image retrieval and a training-free similarity fusion method that enhances retrieval performance without additional training.
Findings
BASIC achieves state-of-the-art results on i-CIR dataset.
The i-CIR dataset effectively challenges retrieval models with over 40 million distractors.
BASIC improves existing CIR performance on semantic-level class datasets.
Abstract
The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training and evaluation data. We introduce a new evaluation dataset, i-CIR, which, unlike existing datasets, focuses on an instance-level class definition. The goal is to retrieve images that contain the same particular object as the visual query, presented under a variety of modifications defined by textual queries. Its design and curation process keep the dataset compact to facilitate future research, while maintaining its challenge-comparable to retrieval among more than 40M random distractors-through a semi-automated selection of hard negatives. To overcome the challenge of obtaining clean, diverse, and suitable training data, we leverage pre-trained vision-and-language models (VLMs) in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
