Learning What and Where to Draw

Scott Reed; Zeynep Akata; Santosh Mohan; Samuel Tenka; Bernt Schiele,; Honglak Lee

arXiv:1610.02454·cs.CV·October 11, 2016·210 cites

Learning What and Where to Draw

Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele,, Honglak Lee

PDF

Open Access

TL;DR

The paper introduces GAWWN, a GAN-based model that synthesizes images with control over content and spatial location, enabling detailed and flexible image generation from text and location instructions.

Contribution

We propose GAWWN, a novel GAN architecture that allows explicit control over what to draw and where, including part-level conditioning and flexible subset control.

Findings

01

High-quality 128x128 bird images conditioned on text and location

02

Control over object bounding boxes and parts in generated images

03

Preliminary results on human action image synthesis with location control

Abstract

Generative Adversarial Networks (GANs) have recently demonstrated the capability to synthesize compelling real-world images, such as room interiors, album covers, manga, faces, birds, and flowers. While existing models can synthesize images based on global constraints such as a class label or caption, they do not provide control over pose or object location. We propose a new model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes images given instructions describing what content to draw in which location. We show high-quality 128 x 128 image synthesis on the Caltech-UCSD Birds dataset, conditioned on both informal text descriptions and also object location. Our system exposes control over both the bounding box around the bird and its constituent parts. By modeling the conditional distributions over part locations, our system also enables conditioning on arbitrary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Multimodal Machine Learning Applications