Increasing the Generalisation Capacity of Conditional VAEs
Alexej Klushyn, Nutan Chen, Botond Cseke, Justin Bayer, Patrick van, der Smagt

TL;DR
This paper enhances the generalisation ability of conditional variational autoencoders by modifying the latent space and prior, leading to better performance on structured prediction tasks with diverse solutions.
Contribution
It introduces a new approach to incentivise informative latent representations and employs a multimodal prior to improve generalisation in conditional VAEs.
Findings
Higher generalisation capability demonstrated on multiple datasets
Significant improvement over baseline models
Effective capturing of semantically meaningful features
Abstract
We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation capacity of conditional variational autoencoders. To this end, we modify the latent variable model by defining the likelihood as a function of the latent variable only and introduce an expressive multimodal prior to enable the model for capturing semantically meaningful features of the data. To validate our approach, we train our model on the Cornell Robot Grasping dataset, and modified versions of MNIST and Fashion-MNIST obtaining results that show a significantly higher generalisation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
