TL;DR
This paper introduces a novel multi-object image generation framework that produces high-fidelity, coherent images with object annotations without needing explicit contextual information, useful for domains like medical imaging.
Contribution
The proposed method combines VQ-VAE with autoregressive priors PixelSNAIL and LayoutPixelSNAIL to generate multi-object images with preserved spatial and semantic coherence without auxiliary input.
Findings
Outperforms state-of-the-art multi-object generative methods on Multi-MNIST and CLEVR datasets.
Generated images maintain high fidelity and object coherence.
Augmenting training data with generated images improves model performance in medical imaging.
Abstract
Recent developments related to generative models have made it possible to generate diverse high-fidelity images. In particular, layout-to-image generation models have gained significant attention due to their capability to generate realistic complex images containing distinct objects. These models are generally conditioned on either semantic layouts or textual descriptions. However, unlike natural images, providing auxiliary information can be extremely hard in domains such as biomedical imaging and remote sensing. In this work, we propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring their contextual information during the generation process. Based on a vector-quantized variational autoencoder (VQ-VAE) backbone, our model learns to preserve spatial coherency within an image as well as semantic coherency between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsVQ-VAE · Solana Customer Service Number +1-833-534-1729
