Training \beta-VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder
Jianning Li, Jana Fragemann, Seyed-Ahmad Ahmadi, Jens Kleesiek, and, Jan Egger

TL;DR
This paper introduces a two-stage training method for eta-VAE that aggregates a learned Gaussian posterior with a decoupled decoder, eliminating the need for hyperparameter tuning and improving reconstruction and disentanglement.
Contribution
The paper proposes a novel two-stage eta-VAE training approach that aggregates a learned Gaussian posterior with a decoupled decoder, simplifying training and enhancing performance.
Findings
Achieves Gaussian latent space assumption with high fidelity.
Reconstruction error comparable to traditional eta-VAE.
No hyperparameter tuning required for eta in new method.
Abstract
The reconstruction loss and the Kullback-Leibler divergence (KLD) loss in a variational autoencoder (VAE) often play antagonistic roles, and tuning the weight of the KLD loss in -VAE to achieve a balance between the two losses is a tricky and dataset-specific task. As a result, current practices in VAE training often result in a trade-off between the reconstruction fidelity and the continuitydisentanglement of the latent space, if the weight is not carefully tuned. In this paper, we present intuitions and a careful analysis of the antagonistic mechanism of the two losses, and propose, based on the insights, a simple yet effective two-stage method for training a VAE. Specifically, the method aggregates a learned Gaussian posterior with a decoder decoupled from the KLD loss, which is trained to learn a new conditional distribution $p_{\phi}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
