WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for   Zero-Shot Sketch-Based Image Retrieval

Guanglong Xu; Zhensheng Hu; Jia Cai

arXiv:2202.05465·cs.CV·February 14, 2022·1 cites

WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval

Guanglong Xu, Zhensheng Hu, Jia Cai

PDF

Open Access

TL;DR

This paper introduces WAD-CMSN, a novel Wasserstein distance-based network that effectively aligns cross-modal sketch and image features in a shared semantic space, significantly improving zero-shot sketch-based image retrieval performance.

Contribution

It proposes a Wasserstein distance adversarial training framework for better cross-domain alignment in ZSSBIR, addressing previous methods' limitations and enhancing retrieval accuracy.

Findings

01

Outperforms existing methods on Sketchy and TU-Berlin datasets.

02

Effectively reduces cross-domain discrepancy between sketches and images.

03

Improves semantic feature alignment and retrieval accuracy.

Abstract

Zero-shot sketch-based image retrieval (ZSSBIR), as a popular studied branch of computer vision, attracts wide attention recently. Unlike sketch-based image retrieval (SBIR), the main aim of ZSSBIR is to retrieve natural images given free hand-drawn sketches that may not appear during training. Previous approaches used semantic aligned sketch-image pairs or utilized memory expensive fusion layer for projecting the visual information to a low dimensional subspace, which ignores the significant heterogeneous cross-domain discrepancy between highly abstract sketch and relevant image. This may yield poor performance in the training phase. To tackle this issue and overcome this drawback, we propose a Wasserstein distance based cross-modal semantic network (WAD-CMSN) for ZSSBIR. Specifically, it first projects the visual information of each branch (sketch, image) to a common low dimensional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning