Zero-Shot Sketch Based Image Retrieval using Graph Transformer

Sumrit Gupta; Ushasi Chaudhuri; Biplab Banerjee

arXiv:2201.10185·cs.CV·May 31, 2022

Zero-Shot Sketch Based Image Retrieval using Graph Transformer

Sumrit Gupta, Ushasi Chaudhuri, Biplab Banerjee

PDF

Open Access

TL;DR

This paper introduces a graph transformer-based framework for zero-shot sketch-based image retrieval that effectively bridges domain gaps and utilizes semantic class topology, resulting in significant performance improvements.

Contribution

The paper proposes a novel graph transformer model and a domain-shared space with Wasserstein distance and compatibility loss for improved ZS-SBIR performance.

Findings

01

Sharp improvements over state-of-the-art in ZS-SBIR and generalized ZS-SBIR.

02

Effective preservation of class topology in semantic space.

03

Bridging domain gaps with Wasserstein distance and compatibility loss.

Abstract

The performance of a zero-shot sketch-based image retrieval (ZS-SBIR) task is primarily affected by two challenges. The substantial domain gap between image and sketch features needs to be bridged, while at the same time the side information has to be chosen tactfully. Existing literature has shown that varying the semantic side information greatly affects the performance of ZS-SBIR. To this end, we propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks which uses a novel graph transformer to preserve the topology of the classes in the semantic space and propagates the context-graph of the classes within the embedding features of the visual space. To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Laplacian EigenMap · Dropout · Byte Pair Encoding · Dense Connections · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Adam