Conditional Generative Modeling via Learning the Latent Space
Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, and, Stephen Gould

TL;DR
This paper introduces a versatile conditional generative framework that models multimodal outputs using latent variables, enabling faster convergence, better representations, and diverse outputs compared to existing methods.
Contribution
It presents a novel latent space learning approach for conditional generation that outperforms domain-specific pipelines in multimodal tasks.
Findings
Faster and more stable convergence in multimodal generation
Improved representations for downstream tasks
Generation of diverse outputs surpassing engineered pipelines
Abstract
Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings. We propose a novel general-purpose framework for conditional generation in multimodal spaces, that uses latent variables to model generalizable learning patterns while minimizing a family of regression cost functions. At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes. Compared to existing generative solutions, in multimodal spaces, our approach demonstrates faster and stable convergence, and can learn better representations for downstream tasks. Importantly, it provides a simple generic model that can beat highly engineered pipelines tailored using domain expertise on a variety of tasks, while generating diverse outputs. Our codes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Video Analysis and Summarization
