Text-To-Image with Generative Adversarial Networks
Mehrshad Momen-Tayefeh

TL;DR
This paper compares five GAN-based text-to-image methods, analyzing their architectures and image resolutions, to identify the most effective approach for generating realistic images from textual descriptions.
Contribution
It provides a comparative analysis of different GAN architectures for text-to-image synthesis, highlighting their performance across various resolutions.
Findings
Best resolution achieved was 256x256
Model performance varied significantly across approaches
Identified the most effective GAN model for text-to-image generation
Abstract
Generating realistic images from human texts is one of the most challenging problems in the field of computer vision (CV). The meaning of descriptions given can be roughly reflected by existing text-to-image approaches. In this paper, our main purpose is to propose a brief comparison between five different methods base on the Generative Adversarial Networks (GAN) to make image from the text. In addition, each model architectures synthesis images with different resolution. Furthermore, the best and worst obtained resolutions is 64*64, 256*256 respectively. However, we checked and compared some metrics that introduce the accuracy of each model. Also, by doing this study, we found out the best model for this problem by comparing these different approaches essential metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Image Processing and 3D Reconstruction · Generative Adversarial Networks and Image Synthesis
MethodsBalanced Selection
