Pseudo-triplet Guided Few-shot Composed Image Retrieval
Bohan Hou, Haoqiang Lin, Haokun Wen, Meng Liu, Mingzhu Xu, Xuemeng, Song

TL;DR
This paper introduces PTG-FSCIR, a two-stage pseudo triplet guided approach for few-shot composed image retrieval that enhances model training with pseudo triplets and challenging triplet sampling, improving performance across multiple datasets.
Contribution
The paper proposes a novel two-stage pseudo triplet guided scheme for few-shot CIR, addressing data scarcity and triplet selection issues, compatible with existing models.
Findings
Achieves up to 22.2% improvement on CIRR dataset.
Effective pseudo triplet generation from pure image data.
Robust triplet sampling strategy enhances model fine-tuning.
Abstract
Composed Image Retrieval (CIR) is a challenging task that aims to retrieve the target image with a multimodal query, i.e., a reference image, and its complementary modification text. As previous supervised or zero-shot learning paradigms all fail to strike a good trade-off between the model's generalization ability and retrieval performance, recent researchers have introduced the task of few-shot CIR (FS-CIR) and proposed a textual inversion-based network based on pretrained CLIP model to realize it. Despite its promising performance, the approach encounters two key limitations: simply relying on the few annotated samples for CIR model training and indiscriminately selecting training triplets for CIR model fine-tuning. To address these two limitations, we propose a novel two-stage pseudo triplet guided few-shot CIR scheme, dubbed PTG-FSCIR. In the first stage, we propose an attentive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Video Analysis and Summarization
MethodsContrastive Language-Image Pre-training
