WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
Guanglong Xu, Zhensheng Hu, Jia Cai

TL;DR
This paper introduces WAD-CMSN, a novel Wasserstein distance-based network that effectively aligns cross-modal sketch and image features in a shared semantic space, significantly improving zero-shot sketch-based image retrieval performance.
Contribution
It proposes a Wasserstein distance adversarial training framework for better cross-domain alignment in ZSSBIR, addressing previous methods' limitations and enhancing retrieval accuracy.
Findings
Outperforms existing methods on Sketchy and TU-Berlin datasets.
Effectively reduces cross-domain discrepancy between sketches and images.
Improves semantic feature alignment and retrieval accuracy.
Abstract
Zero-shot sketch-based image retrieval (ZSSBIR), as a popular studied branch of computer vision, attracts wide attention recently. Unlike sketch-based image retrieval (SBIR), the main aim of ZSSBIR is to retrieve natural images given free hand-drawn sketches that may not appear during training. Previous approaches used semantic aligned sketch-image pairs or utilized memory expensive fusion layer for projecting the visual information to a low dimensional subspace, which ignores the significant heterogeneous cross-domain discrepancy between highly abstract sketch and relevant image. This may yield poor performance in the training phase. To tackle this issue and overcome this drawback, we propose a Wasserstein distance based cross-modal semantic network (WAD-CMSN) for ZSSBIR. Specifically, it first projects the visual information of each branch (sketch, image) to a common low dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
