CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained   or Not

Aneeshan Sain; Ayan Kumar Bhunia; Pinaki Nath Chowdhury; Subhadeep; Koley; Tao Xiang; Yi-Zhe Song

arXiv:2303.13440·cs.CV·March 29, 2023·5 cites

CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

Aneeshan Sain, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Subhadeep, Koley, Tao Xiang, Yi-Zhe Song

PDF

Open Access

TL;DR

This paper adapts CLIP with prompt learning for zero-shot sketch-based image retrieval, achieving significant improvements in category-level and fine-grained settings by introducing novel regularization and patch shuffling techniques.

Contribution

It presents a novel prompt learning approach tailored for CLIP to enhance zero-shot sketch-based image retrieval, including new methods for fine-grained matching.

Findings

01

24.8% improvement in category-level ZS-SBIR over prior arts

02

26.9% performance gain in fine-grained ZS-SBIR

03

Prompt learning with regularization and patch shuffling enhances retrieval accuracy

Abstract

In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). We are largely inspired by recent advances on foundation models and the unparalleled generalisation ability they seem to offer, but for the first time tailor it to benefit the sketch community. We put forward novel designs on how best to achieve this synergy, for both the category setting and the fine-grained setting ("all"). At the very core of our solution is a prompt learning setup. First we show just via factoring in sketch-specific prompts, we already have a category-level ZS-SBIR system that overshoots all prior arts, by a large margin (24.8%) - a great testimony on studying the CLIP and ZS-SBIR synergy. Moving onto the fine-grained setup is however trickier, and requires a deeper dive into this synergy. For that, we come up with two specific designs to tackle the fine-grained matching nature of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques

MethodsContrastive Language-Image Pre-training