Modal-aware Features for Multimodal Hashing

Haien Zeng; Hanjiang Lai; Hanlu Chu; Yong Tang; Jian Yin

arXiv:1911.08479·cs.CV·November 21, 2019

Modal-aware Features for Multimodal Hashing

Haien Zeng, Hanjiang Lai, Hanlu Chu, Yong Tang, Jian Yin

PDF

Open Access

TL;DR

This paper introduces a modal-aware operation that captures non-linear dependencies among multimodal features, improving the quality of joint representations for retrieval tasks.

Contribution

It proposes a novel modal-aware operation combining kernel and attention networks to better exploit inter-modality correlations during feature learning.

Findings

01

Significant performance improvements over state-of-the-art methods.

02

Effective capture of non-linear dependencies among modalities.

03

Enhanced binary hash code representations for retrieval.

Abstract

Many retrieval applications can benefit from multiple modalities, e.g., text that contains images on Wikipedia, for which how to represent multimodal data is the critical component. Most deep multimodal learning methods typically involve two steps to construct the joint representations: 1) learning of multiple intermediate features, with each intermediate feature corresponding to a modality, using separate and independent deep models; 2) merging the intermediate features into a joint representation using a fusion strategy. However, in the first step, these intermediate features do not have previous knowledge of each other and cannot fully exploit the information contained in the other modalities. In this paper, we present a modal-aware operation as a generic building block to capture the non-linear dependences among the heterogeneous intermediate features that can learn the underlying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods