Controllable Text-to-Image Generation
Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr

TL;DR
This paper introduces ControlGAN, a novel text-to-image generation model that allows precise control over image attributes based on natural language, utilizing attention mechanisms and fine-grained supervision to improve quality and manipulability.
Contribution
The paper presents a new controllable text-to-image GAN with word-level attention and a word-level discriminator for fine-grained attribute control, outperforming previous methods.
Findings
Outperforms existing state-of-the-art methods on benchmark datasets.
Effectively manipulates specific visual attributes based on natural language descriptions.
Generates high-quality images with controllable features.
Abstract
In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions. To achieve this, we introduce a word-level spatial and channel-wise attention-driven generator that can disentangle different visual attributes, and allow the model to focus on generating and manipulating subregions corresponding to the most relevant words. Also, a word-level discriminator is proposed to provide fine-grained supervisory feedback by correlating words with image regions, facilitating training an effective generator which is able to manipulate specific visual attributes without affecting the generation of other content. Furthermore, perceptual loss is adopted to reduce the randomness involved in the image generation, and to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Processing and 3D Reconstruction
