TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

Weihao Xia; Yujiu Yang; Jing-Hao Xue; Baoyuan Wu

arXiv:2012.03308·cs.CV·March 30, 2021·31 cites

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu

PDF

Open Access 5 Repos 1 Models

TL;DR

TediGAN is a versatile framework that enables high-resolution, diverse face image generation and manipulation guided by text and other modalities, integrating StyleGAN inversion, visual-linguistic similarity, and instance-level optimization.

Contribution

It introduces a multi-modal image synthesis method with a new dataset, achieving high-quality, diverse face images guided by text and other inputs.

Findings

01

Produces 1024-resolution high-quality images

02

Supports multi-modal inputs like sketches and labels

03

Outperforms existing methods in experiments

Abstract

In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module maps real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. The instance-level optimization is for identity preservation in manipulation. Our model can produce diverse and high-quality images with an unprecedented resolution at 1024. Using a control mechanism based on style-mixing, our TediGAN inherently supports image synthesis with multi-modal inputs, such as sketches or semantic labels, with or without instance guidance. To facilitate text-guided multi-modal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
huggan/TediGAN_sketch
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Multimodal Machine Learning Applications

MethodsConvolution · Dense Connections · Feedforward Network · HuMan(Expedia)||How do I get a human at Expedia? · R1 Regularization · Adaptive Instance Normalization · StyleGAN