PerceptionGAN: Real-world Image Construction from Provided Text through   Perceptual Understanding

Kanish Garg; Ajeet kumar Singh; Dorien Herremans; Brejesh Lall

arXiv:2007.00977·cs.CV·July 3, 2020

PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding

Kanish Garg, Ajeet kumar Singh, Dorien Herremans, Brejesh Lall

PDF

TL;DR

PerceptionGAN enhances text-to-image generation by incorporating perceptual understanding early in the process, leading to higher quality, more realistic images with better object details and interactions.

Contribution

This paper introduces a perceptual understanding module into the discriminator to improve initial image quality in text-to-image synthesis, integrated into the StackGAN framework.

Findings

01

Generated images are more realistic and detailed.

02

Improved perceptual information leads to better object shapes and colors.

03

Method outperforms state-of-the-art on MS COCO dataset.

Abstract

Generating an image from a provided descriptive text is quite a challenging task because of the difficulty in incorporating perceptual information (object shapes, colors, and their interactions) along with providing high relevancy related to the provided text. Current methods first generate an initial low-resolution image, which typically has irregular object shapes, colors, and interaction between objects. This initial image is then improved by conditioning on the text. However, these methods mainly address the problem of using text representation efficiently in the refinement of the initially generated image, while the success of this refinement process depends heavily on the quality of the initially generated image, as pointed out in the DM-GAN paper. Hence, we propose a method to provide good initialized images by incorporating perceptual understanding in the discriminator module.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.