Multimodal Adversarially Learned Inference with Factorized Discriminators
Wenxue Chen, Jianke Zhu

TL;DR
This paper introduces a novel multimodal generative model using adversarial learning and contrastive discriminator training, improving representation quality and outperforming existing methods on benchmark datasets.
Contribution
It proposes a new factorized discriminator for multimodal GANs that enables efficient contrastive training on unimodal data, enhancing multimodal generative modeling.
Findings
Outperforms state-of-the-art methods on benchmark datasets
Effective use of contrastive learning with factorized discriminators
Improved multimodal data representation quality
Abstract
Learning from multimodal data is an important research topic in machine learning, which has the potential to obtain better representations. In this work, we propose a novel approach to generative modeling of multimodal data based on generative adversarial networks. To learn a coherent multimodal generative model, we show that it is necessary to align different encoder distributions with the joint decoder distribution simultaneously. To this end, we construct a specific form of the discriminator to enable our model to utilize data efficiently, which can be trained constrastively. By taking advantage of contrastive learning through factorizing the discriminator, we train our model on unimodal data. We have conducted experiments on the benchmark datasets, whose promising results show that our proposed approach outperforms the-state-of-the-art methods on a variety of metrics. The source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Speech and Audio Processing
