Disentangling the Spatial Structure and Style in Conditional VAE

Ziye Zhang; Li Sun; Zhilin Zheng; Qingli Li

arXiv:1910.13062·cs.CV·July 16, 2020·1 cites

Disentangling the Spatial Structure and Style in Conditional VAE

Ziye Zhang, Li Sun, Zhilin Zheng, Qingli Li

PDF

Open Access

TL;DR

This paper introduces a method to disentangle spatial structure and style in conditional VAE, improving interpretability and control over generated images by separating label-relevant and irrelevant latent factors.

Contribution

It proposes a novel disentanglement approach in cVAE that separates spatial and style information, with a flexible generator architecture utilizing adaptive normalization.

Findings

01

Effective disentanglement demonstrated on two datasets

02

Improved control over spatial and style features in generated images

03

Enhanced interpretability of latent space representations

Abstract

This paper aims to disentangle the latent space in cVAE into the spatial structure and the style code, which are complementary to each other, with one of them $z_{s}$ being label relevant and the other $z_{u}$ irrelevant. The generator is built by a connected encoder-decoder and a label condition mapping network. Depending on whether the label is related with the spatial structure, the output $z_{s}$ from the condition mapping network is used either as a style code or a spatial structure code. The encoder provides the label irrelevant posterior from which $z_{u}$ is sampled. The decoder employs $z_{s}$ and $z_{u}$ in each layer by adaptive normalization like SPADE or AdaIN. Extensive experiments on two datasets with different types of labels show the effectiveness of our method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Video Analysis and Summarization

MethodsSpatially-Adaptive Normalization · Conditional Variational Auto Encoder