ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval
Hao Ren, Ziqiang Zheng, Yang Wu, Hong Lu, Yang Yang, Ying Shan,, Sai-Kit Yeung

TL;DR
ACNet introduces a joint sketch-to-photo synthesis and retrieval framework that generates diverse photo-like images guided by retrieval to improve zero-shot sketch-based image retrieval, achieving state-of-the-art results.
Contribution
The paper proposes ACNet, a novel approach that jointly optimizes sketch synthesis and retrieval, effectively bridging domain and knowledge gaps for zero-shot SBIR.
Findings
Achieves state-of-the-art performance on ZS-SBIR datasets.
Effectively alleviates overfitting with diverse generated images.
Proxy-based NormSoftmax loss stabilizes training and enhances generalization.
Abstract
The huge domain gap between sketches and photos and the highly abstract sketch representations pose challenges for sketch-based image retrieval (\underline{SBIR}). The zero-shot sketch-based image retrieval (\underline{ZS-SBIR}) is more generic and practical but poses an even greater challenge because of the additional knowledge gap between the seen and unseen categories. To simultaneously mitigate both gaps, we propose an \textbf{A}pproaching-and-\textbf{C}entralizing \textbf{Net}work (termed "\textbf{ACNet}") to jointly optimize sketch-to-photo synthesis and the image retrieval. The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain, and thus better serve the retrieval module than ever to learn domain-agnostic representations and category-agnostic common knowledge for generalizing to unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
