Learning What and Where to Draw
Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele,, Honglak Lee

TL;DR
The paper introduces GAWWN, a GAN-based model that synthesizes images with control over content and spatial location, enabling detailed and flexible image generation from text and location instructions.
Contribution
We propose GAWWN, a novel GAN architecture that allows explicit control over what to draw and where, including part-level conditioning and flexible subset control.
Findings
High-quality 128x128 bird images conditioned on text and location
Control over object bounding boxes and parts in generated images
Preliminary results on human action image synthesis with location control
Abstract
Generative Adversarial Networks (GANs) have recently demonstrated the capability to synthesize compelling real-world images, such as room interiors, album covers, manga, faces, birds, and flowers. While existing models can synthesize images based on global constraints such as a class label or caption, they do not provide control over pose or object location. We propose a new model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes images given instructions describing what content to draw in which location. We show high-quality 128 x 128 image synthesis on the Caltech-UCSD Birds dataset, conditioned on both informal text descriptions and also object location. Our system exposes control over both the bounding box around the bird and its constituent parts. By modeling the conditional distributions over part locations, our system also enables conditioning on arbitrary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Multimodal Machine Learning Applications
