TL;DR
This paper introduces a class- and layer-wise VAE framework for semantic image synthesis, enabling more diverse and controllable image generation by capturing multiple object style factors.
Contribution
It extends the VAE framework with multiple latent spaces for local and global control over object styles, improving diversity and plausibility in image synthesis.
Findings
Generated images are more diverse than state-of-the-art methods.
The approach produces plausible images across multiple domains.
Enables flexible image editing and synthesis applications.
Abstract
Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multiple factors. To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces. Furthermore, we demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods via extensive experiments with real and synthetic datasets inthree different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
