Image Generation from Scene Graphs
Justin Johnson, Agrim Gupta, Li Fei-Fei

TL;DR
This paper introduces a novel method for generating complex images from scene graphs by explicitly modeling objects and their relationships, using graph convolution, scene layout prediction, and adversarial training.
Contribution
The proposed approach combines graph convolution, scene layout prediction, and cascaded refinement to generate detailed images from scene graphs, addressing limitations of previous text-based methods.
Findings
Successfully generates complex images with multiple objects.
Outperforms existing methods on Visual Genome and COCO-Stuff datasets.
User studies confirm high realism and fidelity of generated images.
Abstract
To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods give stunning results on limited domains such as descriptions of birds or flowers, but struggle to faithfully reproduce complex sentences with many objects and relationships. To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships. Our model uses graph convolution to process input graphs, computes a scene layout by predicting bounding boxes and segmentation masks for objects, and converts the layout to an image with a cascaded refinement network. The network is trained adversarially against a pair of discriminators to ensure realistic outputs. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
MethodsConvolution
