PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval
Osman Tursun, Sinan Kalkan, Simon Denman, Clinton Fookes

TL;DR
This paper introduces Prompt Directional Vectors (PDV), a training-free method to enhance zero-shot composed image retrieval by dynamically adjusting text and image embeddings and improving fusion, leading to better retrieval accuracy.
Contribution
The paper proposes PDV, a novel, training-free enhancement for ZS-CIR that improves embedding representations and fusion, outperforming existing methods across benchmarks.
Findings
PDV improves retrieval performance across multiple benchmarks.
PDV enhances the quality of composed embeddings for better semantic matching.
Integration of PDV with state-of-the-art methods yields consistent gains.
Abstract
Zero-shot Composed Image Retrieval (ZS-CIR) enables image search using a reference image and a text prompt without requiring specialized text-image composition networks trained on large-scale paired data. However, current ZS-CIR approaches suffer from three critical limitations in their reliance on composed text embeddings: static query embedding representations, insufficient utilization of image embeddings, and suboptimal performance when fusing text and image embeddings. To address these challenges, we introduce the \textbf{Prompt Directional Vector (PDV)}, a simple yet effective training-free enhancement that captures semantic modifications induced by user prompts. PDV enables three key improvements: (1) Dynamic composed text embeddings where prompt adjustments are controllable via a scaling factor, (2) composed image embeddings through semantic transfer from text prompts to image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
