Fine-grained Text to Image Synthesis
Xu Ouyang, Ying Chen, Kaiyue Zhu, Gady Agam

TL;DR
This paper enhances fine-grained text-to-image synthesis by integrating an auxiliary classifier and contrastive learning into GANs, leading to more accurate and detailed image generation from complex textual descriptions.
Contribution
It introduces a novel approach combining auxiliary classifiers and contrastive learning to improve fine-grained detail accuracy in GAN-based text-to-image synthesis.
Findings
Outperforms existing methods on CUB-200-2011 and Oxford-102 datasets.
Achieves higher accuracy in classifying fine-grained details.
Produces more realistic and detailed images from complex texts.
Abstract
Fine-grained text to image synthesis involves generating images from texts that belong to different categories. In contrast to general text to image synthesis, in fine-grained synthesis there is high similarity between images of different subclasses, and there may be linguistic discrepancy among texts describing the same image. Recent Generative Adversarial Networks (GAN), such as the Recurrent Affine Transformation (RAT) GAN model, are able to synthesize clear and realistic images from texts. However, GAN models ignore fine-grained level information. In this paper we propose an approach that incorporates an auxiliary classifier in the discriminator and a contrastive learning method to improve the accuracy of fine-grained details in images synthesized by RAT GAN. The auxiliary classifier helps the discriminator classify the class of images, and helps the generator synthesize more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction
MethodsContrastive Learning · Auxiliary Classifier
