Dual-Modal Prompting for Sketch-Based Image Retrieval

Liying Gao; Bingliang Jiao; Peng Wang; Shizhou Zhang and; Hanwang Zhang; Yanning Zhang

arXiv:2404.18695·cs.CV·April 30, 2024

Dual-Modal Prompting for Sketch-Based Image Retrieval

Liying Gao, Bingliang Jiao, Peng Wang, Shizhou Zhang and, Hanwang Zhang, Yanning Zhang

PDF

Open Access

TL;DR

This paper introduces DP-CLIP, a dual-modal prompting network that enhances sketch-based image retrieval by effectively adapting to unseen categories and intra-category variations, significantly improving zero-shot retrieval accuracy.

Contribution

The paper proposes a novel adaptive prompting strategy within a dual-modal CLIP framework to improve zero-shot, fine-grained SBIR performance on unseen categories.

Findings

01

Outperforms state-of-the-art zero-shot SBIR methods by 7.3% in Acc.@1 on Sketchy dataset.

02

Effectively adapts to unseen categories using category-adaptive prompts.

03

Achieves promising results on multiple zero-shot SBIR benchmarks.

Abstract

Sketch-based image retrieval (SBIR) associates hand-drawn sketches with their corresponding realistic images. In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval. Our key innovation lies in the realization that solely addressing this cross-category and fine-grained recognition task from the generalization perspective may be inadequate since the knowledge accumulated from limited seen categories might not be fully valuable or transferable to unseen target categories. Inspired by this, in this work, we propose a dual-modal prompting CLIP (DP-CLIP) network, in which an adaptive prompting strategy is designed. Specifically, to facilitate the adaptation of our DP-CLIP toward unpredictable target categories, we employ a set of images within the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training