Zero-Shot Sketch Based Image Retrieval using Graph Transformer
Sumrit Gupta, Ushasi Chaudhuri, Biplab Banerjee

TL;DR
This paper introduces a graph transformer-based framework for zero-shot sketch-based image retrieval that effectively bridges domain gaps and utilizes semantic class topology, resulting in significant performance improvements.
Contribution
The paper proposes a novel graph transformer model and a domain-shared space with Wasserstein distance and compatibility loss for improved ZS-SBIR performance.
Findings
Sharp improvements over state-of-the-art in ZS-SBIR and generalized ZS-SBIR.
Effective preservation of class topology in semantic space.
Bridging domain gaps with Wasserstein distance and compatibility loss.
Abstract
The performance of a zero-shot sketch-based image retrieval (ZS-SBIR) task is primarily affected by two challenges. The substantial domain gap between image and sketch features needs to be bridged, while at the same time the side information has to be chosen tactfully. Existing literature has shown that varying the semantic side information greatly affects the performance of ZS-SBIR. To this end, we propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks which uses a novel graph transformer to preserve the topology of the classes in the semantic space and propagates the context-graph of the classes within the embedding features of the visual space. To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Laplacian EigenMap · Dropout · Byte Pair Encoding · Dense Connections · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Adam
