Text to Image Synthesis using Stacked Conditional Variational   Autoencoders and Conditional Generative Adversarial Networks

Haileleol Tibebu; Aadil Malik; Varuna De Silva

arXiv:2207.03332·cs.CV·August 16, 2022

Text to Image Synthesis using Stacked Conditional Variational Autoencoders and Conditional Generative Adversarial Networks

Haileleol Tibebu, Aadil Malik, Varuna De Silva

PDF

TL;DR

This paper introduces a novel stacked architecture combining Conditional Variational Autoencoders and Conditional GANs to generate high-resolution images from text descriptions, improving image quality and diversity.

Contribution

It proposes a new two-stage network architecture that leverages the strengths of VAEs and GANs for text-to-image synthesis, achieving high-resolution outputs.

Findings

01

Produces high-resolution images conditioned on text descriptions

02

Achieves competitive Inception and FID scores on benchmark datasets

03

Outperforms some existing state-of-the-art methods

Abstract

Synthesizing a realistic image from textual description is a major challenge in computer vision. Current text to image synthesis approaches falls short of producing a highresolution image that represent a text descriptor. Most existing studies rely either on Generative Adversarial Networks (GANs) or Variational Auto Encoders (VAEs). GANs has the capability to produce sharper images but lacks the diversity of outputs, whereas VAEs are good at producing a diverse range of outputs, but the images generated are often blurred. Taking into account the relative advantages of both GANs and VAEs, we proposed a new stacked Conditional VAE (CVAE) and Conditional GAN (CGAN) network architecture for synthesizing images conditioned on a text description. This study uses Conditional VAEs as an initial generator to produce a high-level sketch of the text descriptor. This high-level sketch output from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · Residual Connection · Residual Block