Modal-aware Features for Multimodal Hashing
Haien Zeng, Hanjiang Lai, Hanlu Chu, Yong Tang, Jian Yin

TL;DR
This paper introduces a modal-aware operation that captures non-linear dependencies among multimodal features, improving the quality of joint representations for retrieval tasks.
Contribution
It proposes a novel modal-aware operation combining kernel and attention networks to better exploit inter-modality correlations during feature learning.
Findings
Significant performance improvements over state-of-the-art methods.
Effective capture of non-linear dependencies among modalities.
Enhanced binary hash code representations for retrieval.
Abstract
Many retrieval applications can benefit from multiple modalities, e.g., text that contains images on Wikipedia, for which how to represent multimodal data is the critical component. Most deep multimodal learning methods typically involve two steps to construct the joint representations: 1) learning of multiple intermediate features, with each intermediate feature corresponding to a modality, using separate and independent deep models; 2) merging the intermediate features into a joint representation using a fusion strategy. However, in the first step, these intermediate features do not have previous knowledge of each other and cannot fully exploit the information contained in the other modalities. In this paper, we present a modal-aware operation as a generic building block to capture the non-linear dependences among the heterogeneous intermediate features that can learn the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods
