CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval
Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu

TL;DR
CrossATNet introduces a cross-attention framework for zero-shot sketch-based image retrieval, leveraging semantic graph propagation and hash coding to improve discriminability and efficiency in cross-modal retrieval tasks.
Contribution
The paper proposes a novel cross-attention based framework with semantic graph propagation and hash coding for zero-shot SBIR, addressing limitations of existing generative models.
Findings
Achieves state-of-the-art results on TU-Berlin and Sketchy datasets.
Demonstrates improved retrieval accuracy and response time.
Effectively models discriminative shared space for sketches and images.
Abstract
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich feature space. However, the existing deep generative modeling-based SBIR approaches majorly focus on bridging the gaps between the seen and unseen classes by generating pseudo-unseen-class samples. Besides, violating the ZSL protocol by not utilizing any unseen-class information during training, such techniques do not pay explicit attention to modeling the discriminative nature of the shared space. Also, we note that learning a unified feature space for both the multi-view visual data is a tedious…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTriplet Loss
