Multi-Modal Mutual Information Maximization: A Novel Approach for   Unsupervised Deep Cross-Modal Hashing

Tuan Hoang; Thanh-Toan Do; Tam V. Nguyen; Ngai-Man Cheung

arXiv:2112.06489·cs.CV·December 14, 2021

Multi-Modal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing

Tuan Hoang, Thanh-Toan Do, Tam V. Nguyen, Ngai-Man Cheung

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised deep cross-modal hashing method that maximizes mutual information to learn binary representations preserving intra- and inter-modal similarities, improving retrieval performance.

Contribution

The paper proposes CMIMH, a new mutual information maximization approach for unsupervised cross-modal hashing that balances modality gap reduction and private information preservation.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets.

02

Effectively preserves intra- and inter-modal similarities.

03

Balances modality gap reduction with private information retention.

Abstract

In this paper, we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH). First, to learn informative representations that can preserve both intra- and inter-modal similarities, we leverage the recent advances in estimating variational lower-bound of MI to maximize the MI between the binary representations and input features and between binary representations of different modalities. By jointly maximizing these MIs under the assumption that the binary representations are modelled by multivariate Bernoulli distributions, we can learn binary representations, which can preserve both intra- and inter-modal similarities, effectively in a mini-batch manner with gradient descent. Furthermore, we find out that trying to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications