CRD-CGAN: Category-Consistent and Relativistic Constraints for Diverse Text-to-Image Generation
Tao Hu, Chengjiang Long, Chunxia Xiao

TL;DR
CRD-CGAN introduces category-consistent and relativistic constraints to enhance diversity and realism in text-to-image generation, outperforming existing methods on multiple datasets.
Contribution
The paper proposes a novel CRD-CGAN model with attention, diversity, relativistic, and category-consistent losses for improved photo-realistic and diverse image synthesis from text.
Findings
Outperforms state-of-the-art in photorealism and diversity
Effective in multiple datasets including Birds-200-2011, Oxford-102, MSCOCO
Enhances sensitivity to word attention and noise in image generation
Abstract
Generating photo-realistic images from a text description is a challenging problem in computer vision. Previous works have shown promising performance to generate synthetic images conditional on text by Generative Adversarial Networks (GANs). In this paper, we focus on the category-consistent and relativistic diverse constraints to optimize the diversity of synthetic images. Based on those constraints, a category-consistent and relativistic diverse conditional GAN (CRD-CGAN) is proposed to synthesize photo-realistic images simultaneously. We use the attention loss and diversity loss to improve the sensitivity of the GAN to word attention and noises. Then, we employ the relativistic conditional loss to estimate the probability of relatively real or fake for synthetic images, which can improve the performance of basic conditional loss. Finally, we introduce a category-consistent loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
