Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
Zhangchi Feng, Richong Zhang, Zhijie Nie

TL;DR
This paper introduces a novel contrastive learning framework for composed image retrieval that enhances positive and negative sample generation using large language models and a two-stage fine-tuning process, leading to state-of-the-art results.
Contribution
It proposes a scalable data generation method and a two-stage fine-tuning framework for improved contrastive learning in CIR, applicable to existing models without architecture changes.
Findings
Achieves state-of-the-art performance on FashionIQ and CIRR datasets.
Effectively scales positives and negatives for contrastive learning.
Performs well in zero-shot scenarios with limited resources.
Abstract
The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Furthermore, existing methods commonly use in-batch negative sampling, which reduces the negative number available for the model. To address the problem of lack of positives, we propose a data generation method by leveraging a multi-modal large language model to construct triplets for CIR. To introduce more negatives during fine-tuning, we design a two-stage fine-tuning framework for CIR, whose second stage introduces plenty of static representations of negatives to optimize the representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Text and Document Classification Technologies
MethodsContrastive Learning
