PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding
Kanish Garg, Ajeet kumar Singh, Dorien Herremans, Brejesh Lall

TL;DR
PerceptionGAN enhances text-to-image generation by incorporating perceptual understanding early in the process, leading to higher quality, more realistic images with better object details and interactions.
Contribution
This paper introduces a perceptual understanding module into the discriminator to improve initial image quality in text-to-image synthesis, integrated into the StackGAN framework.
Findings
Generated images are more realistic and detailed.
Improved perceptual information leads to better object shapes and colors.
Method outperforms state-of-the-art on MS COCO dataset.
Abstract
Generating an image from a provided descriptive text is quite a challenging task because of the difficulty in incorporating perceptual information (object shapes, colors, and their interactions) along with providing high relevancy related to the provided text. Current methods first generate an initial low-resolution image, which typically has irregular object shapes, colors, and interaction between objects. This initial image is then improved by conditioning on the text. However, these methods mainly address the problem of using text representation efficiently in the refinement of the initially generated image, while the success of this refinement process depends heavily on the quality of the initially generated image, as pointed out in the DM-GAN paper. Hence, we propose a method to provide good initialized images by incorporating perceptual understanding in the discriminator module.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
