Factor Decomposed Generative Adversarial Networks for Text-to-Image   Synthesis

Jiguo Li; Xiaobin Liu; Lirong Zheng

arXiv:2303.13821·cs.MM·March 27, 2023·1 cites

Factor Decomposed Generative Adversarial Networks for Text-to-Image Synthesis

Jiguo Li, Xiaobin Liu, Lirong Zheng

PDF

Open Access

TL;DR

This paper introduces FDGAN, a novel approach for text-to-image synthesis that decomposes noise and sentence embedding factors, leading to more efficient and disentangled image generation.

Contribution

The paper proposes a factor decomposition method in GANs for text-to-image synthesis, improving disentanglement and efficiency over traditional concatenation approaches.

Findings

01

FDGAN achieves better performance than baseline models.

02

FDGAN uses fewer parameters while maintaining quality.

03

Decomposition improves latent factor disentanglement.

Abstract

Prior works about text-to-image synthesis typically concatenated the sentence embedding with the noise vector, while the sentence embedding and the noise vector are two different factors, which control the different aspects of the generation. Simply concatenating them will entangle the latent factors and encumber the generative model. In this paper, we attempt to decompose these two factors and propose Factor Decomposed Generative Adversarial Networks~(FDGAN). To achieve this, we firstly generate images from the noise vector and then apply the sentence embedding in the normalization layer for both generator and discriminators. We also design an additive norm layer to align and fuse the text-image features. The experimental results show that decomposing the noise and the sentence embedding can disentangle latent factors in text-to-image synthesis, and make the generative model more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Generative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques

MethodsALIGN