TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu

TL;DR
TediGAN is a versatile framework that enables high-resolution, diverse face image generation and manipulation guided by text and other modalities, integrating StyleGAN inversion, visual-linguistic similarity, and instance-level optimization.
Contribution
It introduces a multi-modal image synthesis method with a new dataset, achieving high-quality, diverse face images guided by text and other inputs.
Findings
Produces 1024-resolution high-quality images
Supports multi-modal inputs like sketches and labels
Outperforms existing methods in experiments
Abstract
In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module maps real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. The instance-level optimization is for identity preservation in manipulation. Our model can produce diverse and high-quality images with an unprecedented resolution at 1024. Using a control mechanism based on style-mixing, our TediGAN inherently supports image synthesis with multi-modal inputs, such as sketches or semantic labels, with or without instance guidance. To facilitate text-guided multi-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Multimodal Machine Learning Applications
MethodsConvolution · Dense Connections · Feedforward Network · HuMan(Expedia)||How do I get a human at Expedia? · R1 Regularization · Adaptive Instance Normalization · StyleGAN
