Factor Decomposed Generative Adversarial Networks for Text-to-Image Synthesis
Jiguo Li, Xiaobin Liu, Lirong Zheng

TL;DR
This paper introduces FDGAN, a novel approach for text-to-image synthesis that decomposes noise and sentence embedding factors, leading to more efficient and disentangled image generation.
Contribution
The paper proposes a factor decomposition method in GANs for text-to-image synthesis, improving disentanglement and efficiency over traditional concatenation approaches.
Findings
FDGAN achieves better performance than baseline models.
FDGAN uses fewer parameters while maintaining quality.
Decomposition improves latent factor disentanglement.
Abstract
Prior works about text-to-image synthesis typically concatenated the sentence embedding with the noise vector, while the sentence embedding and the noise vector are two different factors, which control the different aspects of the generation. Simply concatenating them will entangle the latent factors and encumber the generative model. In this paper, we attempt to decompose these two factors and propose Factor Decomposed Generative Adversarial Networks~(FDGAN). To achieve this, we firstly generate images from the noise vector and then apply the sentence embedding in the normalization layer for both generator and discriminators. We also design an additive norm layer to align and fuse the text-image features. The experimental results show that decomposing the noise and the sentence embedding can disentangle latent factors in text-to-image synthesis, and make the generative model more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Generative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques
MethodsALIGN
