Distribution Aligned Feature Clustering for Zero-Shot Sketch-Based Image Retrieval
Yuchen Wu, Kun Song, Fangzheng Zhao, Jiansheng Chen, Huimin Ma

TL;DR
This paper introduces a novel zero-shot sketch-based image retrieval method that uses clustering of gallery images and distribution alignment to improve cross-modal retrieval performance, significantly surpassing previous methods.
Contribution
It proposes a cluster-then-retrieve framework with a distribution alignment loss to reduce domain gap in ZS-SBIR, a new approach not explored in prior work.
Findings
Outperforms state-of-the-art methods by up to 39% in mAP@all.
Effective in reducing domain gap between sketches and images.
Achieves significant improvements on Sketchy and TU-Berlin datasets.
Abstract
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a challenging cross-modal retrieval task. In prior arts, the retrieval is conducted by sorting the distance between the query sketch and each image in the gallery. However, the domain gap and the zero-shot setting make neural networks hard to generalize. This paper tackles the challenges from a new perspective: utilizing gallery image features. We propose a Cluster-then-Retrieve (ClusterRetri) method that performs clustering on the gallery images and uses the cluster centroids as proxies for retrieval. Furthermore, a distribution alignment loss is proposed to align the image and sketch features with a common Gaussian distribution, reducing the domain gap. Despite its simplicity, our proposed method outperforms the state-of-the-art methods by a large margin on popular datasets, e.g., up to 31% and 39% relative improvement of mAP@all on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
MethodsALIGN
