CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for   Text-to-Image Synthesis

Jiadong Liang; Wenjie Pei; Feng Lu

arXiv:1912.08562·cs.CV·July 14, 2020·6 cites

CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Jiadong Liang, Wenjie Pei, Feng Lu

PDF

Open Access 1 Repo

TL;DR

CPGAN introduces a content-parsing approach with a memory structure and object-aware semantics to improve text-to-image synthesis, achieving state-of-the-art results on COCO by focusing on semantic consistency.

Contribution

The paper proposes a novel content-parsing GAN framework that models semantic correspondence between text and images at a detailed level, enhancing text-image alignment.

Findings

01

Significant improvement in Inception Score from 35.69 to 52.73.

02

Effective modeling of semantic correspondence improves synthesis quality.

03

Outperforms existing methods on COCO dataset.

Abstract

Typical methods for text-to-image synthesis seek to design effective generative architecture to model the text-to-image mapping directly. It is fairly arduous due to the cross-modality translation. In this paper we circumvent this problem by focusing on parsing the content of both the input text and the synthesized image thoroughly to model the text-to-image consistency in the semantic level. Particularly, we design a memory structure to parse the textual content by exploring semantic correspondence between each word in the vocabulary to its various visual contexts across relevant images during text encoding. Meanwhile, the synthesized image is parsed to learn its semantics in an object-aware manner. Moreover, we customize a conditional discriminator to model the fine-grained correlations between words and image sub-regions to push for the text-image semantic alignment. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dongdongdong666/CPGAN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques