Controllable Text-to-Image Generation

Bowen Li; Xiaojuan Qi; Thomas Lukasiewicz; Philip H. S. Torr

arXiv:1909.07083·cs.CV·December 20, 2019·79 cites

Controllable Text-to-Image Generation

Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr

PDF

Open Access 2 Repos

TL;DR

This paper introduces ControlGAN, a novel text-to-image generation model that allows precise control over image attributes based on natural language, utilizing attention mechanisms and fine-grained supervision to improve quality and manipulability.

Contribution

The paper presents a new controllable text-to-image GAN with word-level attention and a word-level discriminator for fine-grained attribute control, outperforming previous methods.

Findings

01

Outperforms existing state-of-the-art methods on benchmark datasets.

02

Effectively manipulates specific visual attributes based on natural language descriptions.

03

Generates high-quality images with controllable features.

Abstract

In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions. To achieve this, we introduce a word-level spatial and channel-wise attention-driven generator that can disentangle different visual attributes, and allow the model to focus on generating and manipulating subregions corresponding to the most relevant words. Also, a word-level discriminator is proposed to provide fine-grained supervisory feedback by correlating words with image regions, facilitating training an effective generator which is able to manipulate specific visual attributes without affecting the generation of other content. Furthermore, perceptual loss is adopted to reduce the randomness involved in the image generation, and to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Processing and 3D Reconstruction