A Framework For Image Synthesis Using Supervised Contrastive Learning
Yibin Liu, Jianyu Zhang, Li Zhang, Shijian Li, and Gang Pan

TL;DR
This paper introduces a novel framework for text-to-image generation that leverages supervised contrastive learning to better utilize both inter- and inner-modal semantic relationships, significantly improving image quality.
Contribution
It proposes a dual-branch contrastive learning approach integrated into T2I GANs, enhancing semantic clustering and image realism beyond prior methods.
Findings
Significant improvements in Inception Score and FID across datasets.
Enhanced image quality on complex multi-object datasets.
Outperforms existing label-guided T2I GANs.
Abstract
Text-to-image (T2I) generation aims at producing realistic images corresponding to text descriptions. Generative Adversarial Network (GAN) has proven to be successful in this task. Typical T2I GANs are 2 phase methods that first pretrain an inter-modal representation from aligned image-text pairs and then use GAN to train image generator on that basis. However, such representation ignores the inner-modal semantic correspondence, e.g. the images with same label. The semantic label in priory describes the inherent distribution pattern with underlying cross-image relationships, which is supplement to the text description for understanding the full characteristics of image. In this paper, we propose a framework leveraging both inter- and inner-modal correspondence by label guided supervised contrastive learning. We extend the T2I GANs to two parameter-sharing contrast branches in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
