# Disentangling Latent Space for VAE by Label Relevant/Irrelevant   Dimensions

**Authors:** Zhilin Zheng, Li Sun

arXiv: 1812.09502 · 2019-03-18

## TL;DR

This paper proposes a novel VAE method that disentangles latent space into label relevant and irrelevant parts, improving class-specific representation and avoiding posterior collapse.

## Contribution

It introduces a disentangled latent space with class-specific Gaussian mixture distribution and demonstrates theoretical equivalence to mutual information maximization.

## Key findings

- Disentangled latent space improves class-specific representation.
- The method is extendable to GANs for high-quality image synthesis.
- Theoretical analysis shows equivalence to KL divergence on joint distribution.

## Abstract

VAE requires the standard Gaussian distribution as a prior in the latent space. Since all codes tend to follow the same prior, it often suffers the so-called "posterior collapse". To avoid this, this paper introduces the class specific distribution for the latent code. But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, $\bm{\mathrm{z}}_s$ and $\bm{\mathrm{z}}_u$, for a single input. We apply two separated encoders to map the input into $\bm{\mathrm{z}}_s$ and $\bm{\mathrm{z}}_u$ respectively, and then give the concatenated code to the decoder to reconstruct the input. The label irrelevant code $\bm{\mathrm{z}}_u$ represent the common characteristics of all inputs, hence they are constrained by the standard Gaussian, and their encoder is trained in amortized variational inference way, like VAE. While $\bm{\mathrm{z}}_s$ is assumed to follow the Gaussian mixture distribution in which each component corresponds to a particular class. The parameters for the Gaussian components in $\bm{\mathrm{z}}_s$ encoder are optimized by the label supervision in a global stochastic way. In theory, we show that our method is actually equivalent to adding a KL divergence term on the joint distribution of $\bm{\mathrm{z}}_s$ and the class label $c$, and it can directly increase the mutual information between $\bm{\mathrm{z}}_s$ and the label $c$. Our model can also be extended to GAN by adding a discriminator in the pixel domain so that it produces high quality and diverse images.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.09502/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1812.09502/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1812.09502/full.md

---
Source: https://tomesphere.com/paper/1812.09502