Fusion-supervised Deep Cross-modal Hashing
Li Wang, Lei Zhu, En Yu, Jiande Sun, Huaxiang Zhang

TL;DR
This paper introduces FDCH, a novel deep hashing method that learns unified binary codes for cross-modal retrieval, effectively capturing multi-modal correlations and semantic information to improve retrieval accuracy.
Contribution
FDCH proposes a fusion hash network that enhances multi-modal correlation modeling and supervises modality-specific hash networks using high-quality unified codes.
Findings
Achieves state-of-the-art performance on benchmark datasets
Effectively models heterogeneous multi-modal correlations
Preserves semantic consistency in cross-modal retrieval
Abstract
Deep hashing has recently received attention in cross-modal retrieval for its impressive advantages. However, existing hashing methods for cross-modal retrieval cannot fully capture the heterogeneous multi-modal correlation and exploit the semantic information. In this paper, we propose a novel \emph{Fusion-supervised Deep Cross-modal Hashing} (FDCH) approach. Firstly, FDCH learns unified binary codes through a fusion hash network with paired samples as input, which effectively enhances the modeling of the correlation of heterogeneous multi-modal data. Then, these high-quality unified hash codes further supervise the training of the modality-specific hash networks for encoding out-of-sample queries. Meanwhile, both pair-wise similarity information and classification information are embedded in the hash networks under one stream framework, which simultaneously preserves cross-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods
