CM-GANs: Cross-modal Generative Adversarial Networks for Common   Representation Learning

Yuxin Peng; Jinwei Qi; Yuxin Yuan

arXiv:1710.05106·cs.MM·April 27, 2018·66 cites

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

Yuxin Peng, Jinwei Qi, Yuxin Yuan

PDF

Open Access

TL;DR

This paper introduces CM-GANs, a novel cross-modal GAN framework that models joint distributions of heterogeneous data like images and text to learn discriminative common representations, improving cross-modal retrieval.

Contribution

The paper proposes the first GAN-based approach for cross-modal common representation learning, combining generative and discriminative models with autoencoders and adversarial mechanisms.

Findings

01

Outperforms 10 methods on 3 datasets in cross-modal retrieval.

02

Effectively models joint distribution of different modalities.

03

Enhances discriminative power of common representations.

Abstract

It is known that the inconsistent distribution and representation of different modalities, such as image and text, cause the heterogeneity gap that makes it challenging to correlate such heterogeneous data. Generative adversarial networks (GANs) have shown its strong ability of modeling data distribution and learning discriminative representation, existing GANs-based works mainly focus on generative problem to generate new data. We have different goal, aim to correlate heterogeneous data, by utilizing the power of GANs to model cross-modal joint distribution. Thus, we propose Cross-modal GANs to learn discriminative common representation for bridging heterogeneity gap. The main contributions are: (1) Cross-modal GANs architecture is proposed to model joint distribution over data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis