Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval
Wei Chen, Yu Liu, Erwin M. Bakker, Michael S. Lew

TL;DR
This paper introduces a novel cross-modal retrieval method that combines information theory and adversarial learning to bridge heterogeneity and semantic gaps between visual and textual data, improving feature alignment.
Contribution
It proposes an integrated framework using Shannon information entropy and adversarial training to reduce distribution and semantic gaps in cross-modal retrieval.
Findings
Effective in reducing modality distribution discrepancy.
Improves intra- and inter-modality feature similarity.
Validated on four benchmarks with four deep models.
Abstract
Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversarially. For this purpose, a modality classifier (as a discriminator) is built to distinguish the text and image modalities according to their different statistical properties. This discriminator uses its output probabilities to compute Shannon information entropy, which measures the uncertainty of the modality classification it performs. Moreover, feature encoders (as a generator) project uni-modal features into a commonly shared space and attempt to fool the discriminator by maximizing its output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
MethodsTriplet Loss
