Dynamic Multi-level Weighted Alignment Network for Zero-shot Sketch-based Image Retrieval
Hanwen Su, Ge Song, Jiyan Wang, Yuanbo Zhu

TL;DR
This paper introduces a novel dynamic multi-level weighted alignment network for zero-shot sketch-based image retrieval, addressing modality imbalance and low-quality information issues to improve retrieval accuracy.
Contribution
It proposes a multi-component framework with a weighted alignment mechanism and a specialized loss to enhance zero-shot sketch-based image retrieval performance.
Findings
Outperforms state-of-the-art methods on three benchmark datasets.
Effectively balances domain discrepancies with weighted quadruplet loss.
Improves alignment quality through multi-level weighting modules.
Abstract
The problem of zero-shot sketch-based image retrieval (ZS-SBIR) has achieved increasing attention due to its wide applications, e.g. e-commerce. Despite progress made in this field, previous works suffer from using imbalanced samples of modalities and inconsistent low-quality information during training, resulting in sub-optimal performance. Therefore, in this paper, we introduce an approach called Dynamic Multi-level Weighted Alignment Network for ZS-SBIR. It consists of three components: (i) a Uni-modal Feature Extraction Module that includes a CLIP text encoder and a ViT for extracting textual and visual tokens, (ii) a Cross-modal Multi-level Weighting Module that produces an alignment weight list by the local and global aggregation blocks to measure the aligning quality of sketch and image samples, (iii) a Weighted Quadruplet Loss Module aiming to improve the balance of domains in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
