Image Generation from Scene Graphs

Justin Johnson; Agrim Gupta; Li Fei-Fei

arXiv:1804.01622·cs.CV·April 6, 2018·47 cites

Image Generation from Scene Graphs

Justin Johnson, Agrim Gupta, Li Fei-Fei

PDF

Open Access 4 Repos

TL;DR

This paper introduces a novel method for generating complex images from scene graphs by explicitly modeling objects and their relationships, using graph convolution, scene layout prediction, and adversarial training.

Contribution

The proposed approach combines graph convolution, scene layout prediction, and cascaded refinement to generate detailed images from scene graphs, addressing limitations of previous text-based methods.

Findings

01

Successfully generates complex images with multiple objects.

02

Outperforms existing methods on Visual Genome and COCO-Stuff datasets.

03

User studies confirm high realism and fidelity of generated images.

Abstract

To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods give stunning results on limited domains such as descriptions of birds or flowers, but struggle to faithfully reproduce complex sentences with many objects and relationships. To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships. Our model uses graph convolution to process input graphs, computes a scene layout by predicting bounding boxes and segmentation masks for objects, and converts the layout to an image with a cascaded refinement network. The network is trained adversarially against a pair of discriminators to ensure realistic outputs. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsConvolution