Disentangling the Spatial Structure and Style in Conditional VAE
Ziye Zhang, Li Sun, Zhilin Zheng, Qingli Li

TL;DR
This paper introduces a method to disentangle spatial structure and style in conditional VAE, improving interpretability and control over generated images by separating label-relevant and irrelevant latent factors.
Contribution
It proposes a novel disentanglement approach in cVAE that separates spatial and style information, with a flexible generator architecture utilizing adaptive normalization.
Findings
Effective disentanglement demonstrated on two datasets
Improved control over spatial and style features in generated images
Enhanced interpretability of latent space representations
Abstract
This paper aims to disentangle the latent space in cVAE into the spatial structure and the style code, which are complementary to each other, with one of them being label relevant and the other irrelevant. The generator is built by a connected encoder-decoder and a label condition mapping network. Depending on whether the label is related with the spatial structure, the output from the condition mapping network is used either as a style code or a spatial structure code. The encoder provides the label irrelevant posterior from which is sampled. The decoder employs and in each layer by adaptive normalization like SPADE or AdaIN. Extensive experiments on two datasets with different types of labels show the effectiveness of our method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Video Analysis and Summarization
MethodsSpatially-Adaptive Normalization · Conditional Variational Auto Encoder
