Conditional Image Generation and Manipulation for User-Specified Content
David Stap, Maurits Bleeker, Sarah Ibrahimi, Maartje ter Hoeve

TL;DR
This paper introduces a text-conditioned GAN model for precise image generation and manipulation, enabling user-specific facial edits and creating a new dataset with textual descriptions for faces.
Contribution
The paper presents textStyleGAN, a novel text-conditioned GAN, and a pipeline for semantic facial image manipulation, along with the CelebTD-HQ dataset for text-image face data.
Findings
Effective text-to-image generation with user-specific control.
Semantic facial attribute manipulation via latent space directions.
Introduction of the CelebTD-HQ dataset with textual descriptions.
Abstract
In recent years, Generative Adversarial Networks (GANs) have improved steadily towards generating increasingly impressive real-world images. It is useful to steer the image generation process for purposes such as content creation. This can be done by conditioning the model on additional information. However, when conditioning on additional information, there still exists a large set of images that agree with a particular conditioning. This makes it unlikely that the generated image is exactly as envisioned by a user, which is problematic for practical content creation scenarios such as generating facial composites or stock photos. To solve this problem, we propose a single pipeline for text-to-image generation and manipulation. In the first part of our pipeline we introduce textStyleGAN, a model that is conditioned on text. In the second part of our pipeline we make use of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Law in Society and Culture · Multimodal Machine Learning Applications
