# Y-Autoencoders: disentangling latent representations via   sequential-encoding

**Authors:** Massimiliano Patacchiola, Patrick Fox-Roberts, Edward Rosten

arXiv: 1907.10949 · 2019-07-26

## TL;DR

Y-Autoencoders (Y-AE) are a novel autoencoder-based model that disentangles latent representations into implicit and explicit parts, enabling improved interpretability and performance in tasks like style-content separation and image translation.

## Contribution

The paper introduces Y-Autoencoders, a new model that separates latent space into explicit and implicit parts using a Y-shaped encoder, enhancing disentanglement without adversarial training.

## Key findings

- Effective separation of style and content in images.
- Improved performance in image-to-image translation tasks.
- No adversarial losses needed for explicit manifold projection.

## Abstract

In the last few years there have been important advancements in generative models with the two dominant approaches being Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). However, standard Autoencoders (AEs) and closely related structures have remained popular because they are easy to train and adapt to different tasks. An interesting question is if we can achieve state-of-the-art performance with AEs while retaining their good properties. We propose an answer to this question by introducing a new model called Y-Autoencoder (Y-AE). The structure and training procedure of a Y-AE enclose a representation into an implicit and an explicit part. The implicit part is similar to the output of an autoencoder and the explicit part is strongly correlated with labels in the training set. The two parts are separated in the latent space by splitting the output of the encoder into two paths (forming a Y shape) before decoding and re-encoding. We then impose a number of losses, such as reconstruction loss, and a loss on dependence between the implicit and explicit parts. Additionally, the projection in the explicit manifold is monitored by a predictor, that is embedded in the encoder and trained end-to-end with no adversarial losses. We provide significant experimental results on various domains, such as separation of style and content, image-to-image translation, and inverse graphics.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10949/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1907.10949/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1907.10949/full.md

---
Source: https://tomesphere.com/paper/1907.10949