Dual-Modal Prompting for Sketch-Based Image Retrieval
Liying Gao, Bingliang Jiao, Peng Wang, Shizhou Zhang and, Hanwang Zhang, Yanning Zhang

TL;DR
This paper introduces DP-CLIP, a dual-modal prompting network that enhances sketch-based image retrieval by effectively adapting to unseen categories and intra-category variations, significantly improving zero-shot retrieval accuracy.
Contribution
The paper proposes a novel adaptive prompting strategy within a dual-modal CLIP framework to improve zero-shot, fine-grained SBIR performance on unseen categories.
Findings
Outperforms state-of-the-art zero-shot SBIR methods by 7.3% in Acc.@1 on Sketchy dataset.
Effectively adapts to unseen categories using category-adaptive prompts.
Achieves promising results on multiple zero-shot SBIR benchmarks.
Abstract
Sketch-based image retrieval (SBIR) associates hand-drawn sketches with their corresponding realistic images. In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval. Our key innovation lies in the realization that solely addressing this cross-category and fine-grained recognition task from the generalization perspective may be inadequate since the knowledge accumulated from limited seen categories might not be fully valuable or transferable to unseen target categories. Inspired by this, in this work, we propose a dual-modal prompting CLIP (DP-CLIP) network, in which an adaptive prompting strategy is designed. Specifically, to facilitate the adaptation of our DP-CLIP toward unpredictable target categories, we employ a set of images within the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
