You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image   Retrieval

Subhadeep Koley; Ayan Kumar Bhunia; Aneeshan Sain; Pinaki Nath; Chowdhury; Tao Xiang; Yi-Zhe Song

arXiv:2403.07222·cs.CV·March 22, 2024·2 cites

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath, Chowdhury, Tao Xiang, Yi-Zhe Song

PDF

Open Access

TL;DR

This paper introduces a novel duet framework combining sketches and text for fine-grained image retrieval, enhancing precision and enabling new applications by leveraging pre-trained CLIP models without extensive textual descriptions.

Contribution

It proposes a compositionality framework that integrates sketches and text for fine-grained retrieval, extending capabilities beyond sketch-only methods using pre-trained CLIP models.

Findings

01

Enhanced retrieval precision with combined sketch and text inputs

02

Enabled fine-grained queries incorporating color and context

03

Extended to applications like image retrieval and attribute transfer

Abstract

Two primary input modalities prevail in image retrieval: sketch and text. While text is widely used for inter-category retrieval tasks, sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details. In this paper, we question the reliance on sketches alone for fine-grained image retrieval by simultaneously exploring the fine-grained representation capabilities of both sketch and text, orchestrating a duet between the two. The end result enables precise retrievals previously unattainable, allowing users to pose ever-finer queries and incorporate attributes like colour and contextual cues from text. For this purpose, we introduce a novel compositionality framework, effectively combining sketches and text using pre-trained CLIP models, while eliminating the need for extensive fine-grained textual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsContrastive Language-Image Pre-training