Text-to-image Synthesis via Symmetrical Distillation Networks
Mingkuan Yuan, Yuxin Peng

TL;DR
This paper introduces Symmetrical Distillation Networks (SDN) that leverage discriminative models to improve text-to-image synthesis by bridging semantic and visual gaps through hierarchical knowledge transfer.
Contribution
The paper proposes a novel SDN framework with symmetrical structure and two-stage training to enhance text-to-image synthesis performance.
Findings
Effective in bridging semantic and visual gaps
Improves image quality and relevance in generated images
Validated on two widely-used datasets
Abstract
Text-to-image synthesis aims to automatically generate images according to text descriptions given by users, which is a highly challenging task. The main issues of text-to-image synthesis lie in two gaps: the heterogeneous and homogeneous gaps. The heterogeneous gap is between the high-level concepts of text descriptions and the pixel-level contents of images, while the homogeneous gap exists between synthetic image distributions and real image distributions. For addressing these problems, we exploit the excellent capability of generic discriminative models (e.g. VGG19), which can guide the training process of a new generative model on multiple levels to bridge the two gaps. The high-level representations can teach the generative model to extract necessary visual information from text descriptions, which can bridge the heterogeneous gap. The mid-level and low-level representations can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
