Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval
Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song

TL;DR
This paper introduces a hierarchical cross-modal model for fine-grained sketch-based image retrieval, leveraging the inherent detail levels in sketches to improve matching accuracy over existing methods.
Contribution
It proposes a novel network that captures sketch hierarchies and uses cross-modal co-attention and hierarchical fusion for enhanced retrieval performance.
Findings
Outperforms state-of-the-art methods on benchmark datasets
Effectively captures hierarchical sketch details for better matching
Significant improvement in retrieval accuracy
Abstract
Sketch as an image search query is an ideal alternative to text in capturing the fine-grained visual details. Prior successes on fine-grained sketch-based image retrieval (FG-SBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixel-perfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail -- a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
