StackGAN: Text to Photo-realistic Image Synthesis with Stacked   Generative Adversarial Networks

Han Zhang; Tao Xu; Hongsheng Li; Shaoting Zhang; Xiaogang Wang,; Xiaolei Huang; Dimitris Metaxas

arXiv:1612.03242·cs.CV·August 8, 2017

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang,, Xiaolei Huang, Dimitris Metaxas

PDF

5 Repos 1 Models 2 Videos

TL;DR

StackGAN introduces a two-stage generative adversarial network approach that synthesizes high-resolution, photo-realistic images from text descriptions by decomposing the task into sketching and refinement stages, improving detail and diversity.

Contribution

The paper presents a novel stacked GAN framework with a sketch-refinement process and a conditioning augmentation technique for better text-to-image synthesis.

Findings

01

Achieves significant improvements in photo-realistic image quality.

02

Produces diverse images conditioned on text descriptions.

03

Outperforms previous state-of-the-art methods on benchmark datasets.

Abstract

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
srikanthakkaru/KairosT2I
model

Videos

Image Synthesis From Text With Deep Learning | Two Minute Papers #116· youtube

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks· youtube

Taxonomy

MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729