CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based   Image Retrieval

Ushasi Chaudhuri; Biplab Banerjee; Avik Bhattacharya; Mihai Datcu

arXiv:2104.09918·cs.CV·April 22, 2021

CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval

Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu

PDF

TL;DR

CrossATNet introduces a cross-attention framework for zero-shot sketch-based image retrieval, leveraging semantic graph propagation and hash coding to improve discriminability and efficiency in cross-modal retrieval tasks.

Contribution

The paper proposes a novel cross-attention based framework with semantic graph propagation and hash coding for zero-shot SBIR, addressing limitations of existing generative models.

Findings

01

Achieves state-of-the-art results on TU-Berlin and Sketchy datasets.

02

Demonstrates improved retrieval accuracy and response time.

03

Effectively models discriminative shared space for sketches and images.

Abstract

We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich feature space. However, the existing deep generative modeling-based SBIR approaches majorly focus on bridging the gaps between the seen and unseen classes by generating pseudo-unseen-class samples. Besides, violating the ZSL protocol by not utilizing any unseen-class information during training, such techniques do not pay explicit attention to modeling the discriminative nature of the shared space. Also, we note that learning a unified feature space for both the multi-view visual data is a tedious…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTriplet Loss