Semi-supervised Multimodal Hashing
Dayong Tian, Maoguo Gong, Deyun Zhou, Jiao Shi, Yu Lei

TL;DR
This paper introduces a semi-supervised multimodal hashing method that effectively uses partial labels to generate binary codes for cross-modal data retrieval, reducing labeling effort while maintaining high performance.
Contribution
It proposes a novel semi-supervised approach that leverages fuzzy logic and label estimation to improve multimodal hashing with limited labeled data.
Findings
Achieves near-supervised performance with 50% labels
Outperforms some supervised methods with only 10% labels
Reduces need for extensive manual labeling in multimodal retrieval
Abstract
Retrieving nearest neighbors across correlated data in multiple modalities, such as image-text pairs on Facebook and video-tag pairs on YouTube, has become a challenging task due to the huge amount of data. Multimodal hashing methods that embed data into binary codes can boost the retrieving speed and reduce storage requirement. As unsupervised multimodal hashing methods are usually inferior to supervised ones, while the supervised ones requires too much manually labeled data, the proposed method in this paper utilizes a part of labels to design a semi-supervised multimodal hashing method. It first computes the transformation matrices for data matrices and label matrix. Then, with these transformation matrices, fuzzy logic is introduced to estimate a label matrix for unlabeled data. Finally, it uses the estimated label matrix to learn hashing functions for data in each modality to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Image Retrieval and Classification Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
