RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations
Chengde Lin, Xijun Lu, Guangxi Chen

TL;DR
RATLIP introduces a novel GAN-based framework that leverages recurrent affine transformations and CLIP for improved text-to-image synthesis, achieving higher quality and consistency in generated images.
Contribution
The paper proposes a recurrent affine transformation model combined with CLIP-guided discriminator to enhance global information access and image-text alignment in GANs.
Findings
Outperforms state-of-the-art models on multiple datasets
Generates more consistent and detailed images from text descriptions
Utilizes pre-trained CLIP for better scene understanding
Abstract
Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual information unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
MethodsBatch Normalization · Dense Connections · Feedforward Network · Conditional Batch Normalization
