Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath, Chowdhury, Tao Xiang, Yi-Zhe Song

TL;DR
This paper demonstrates that text-to-image diffusion models can effectively facilitate zero-shot sketch-based image retrieval by leveraging their cross-modal capabilities and shape bias, with simple strategies for feature selection and prompt usage.
Contribution
It introduces a novel application of diffusion models for ZS-SBIR, highlighting their cross-modal abilities and proposing a method for optimal feature layer selection and prompt utilization.
Findings
Significant performance improvements on benchmark datasets.
Effective bridging of sketches and photos via diffusion models.
Identification of optimal feature layers for retrieval tasks.
Abstract
This paper, for the first time, explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR). We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos. This proficiency is underpinned by their robust cross-modal capabilities and shape bias, findings that are substantiated through our pilot studies. In order to harness pre-trained diffusion models effectively, we introduce a straightforward yet powerful strategy focused on two key aspects: selecting optimal feature layers and utilising visual and textual prompts. For the former, we identify which layers are most enriched with information and are best suited for the specific retrieval requirements (category-level or fine-grained). Then we employ visual and textual prompts to guide the model's feature extraction process,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsDiffusion
