Multimodal Adversarially Learned Inference with Factorized   Discriminators

Wenxue Chen; Jianke Zhu

arXiv:2112.10384·cs.LG·December 21, 2021

Multimodal Adversarially Learned Inference with Factorized Discriminators

Wenxue Chen, Jianke Zhu

PDF

Open Access

TL;DR

This paper introduces a novel multimodal generative model using adversarial learning and contrastive discriminator training, improving representation quality and outperforming existing methods on benchmark datasets.

Contribution

It proposes a new factorized discriminator for multimodal GANs that enables efficient contrastive training on unimodal data, enhancing multimodal generative modeling.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets

02

Effective use of contrastive learning with factorized discriminators

03

Improved multimodal data representation quality

Abstract

Learning from multimodal data is an important research topic in machine learning, which has the potential to obtain better representations. In this work, we propose a novel approach to generative modeling of multimodal data based on generative adversarial networks. To learn a coherent multimodal generative model, we show that it is necessary to align different encoder distributions with the joint decoder distribution simultaneously. To this end, we construct a specific form of the discriminator to enable our model to utilize data efficiently, which can be trained constrastively. By taking advantage of contrastive learning through factorizing the discriminator, we train our model on unimodal data. We have conducted experiments on the benchmark datasets, whose promising results show that our proposed approach outperforms the-state-of-the-art methods on a variety of metrics. The source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Speech and Audio Processing