Learning Discriminative Hashing Codes for Cross-Modal Retrieval based on Multi-view Features
Jun Yu, Xiao-Jun Wu, and Josef Kittler

TL;DR
This paper introduces a multi-view hashing framework that leverages complementary features from images and texts to generate discriminative codes, significantly improving cross-modal retrieval performance.
Contribution
The proposed method uniquely combines multi-view feature fusion with a joint classifier and subspace learning framework for more effective hashing.
Findings
Outperforms state-of-the-art methods on multiple datasets
Effectively fuses multi-view features for richer representations
Achieves superior retrieval accuracy in cross-modal tasks
Abstract
Hashing techniques have been applied broadly in retrieval tasks due to their low storage requirements and high speed of processing. Many hashing methods based on a single view have been extensively studied for information retrieval. However, the representation capacity of a single view is insufficient and some discriminative information is not captured, which results in limited improvement. In this paper, we employ multiple views to represent images and texts for enriching the feature information. Our framework exploits the complementary information among multiple views to better learn the discriminative compact hash codes. A discrete hashing learning framework that jointly performs classifier learning and subspace learning is proposed to complete multiple search tasks simultaneously. Our framework includes two stages, namely a kernelization process and a quantization process.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
