Transforming Image Generation from Scene Graphs
Renato Sortino, Simone Palazzo, Concetto Spampinato

TL;DR
This paper introduces a transformer-based model conditioned on scene graphs for controllable image generation, enabling iterative modification and better semantic constraint satisfaction, demonstrated on CIFAR10 and MNIST datasets.
Contribution
It presents a novel autoregressive transformer architecture with a decoder for scene graph conditioned image synthesis, enhancing control and flexibility over the generation process.
Findings
Model effectively satisfies semantic constraints from scene graphs.
It models relations between visual objects considering user input.
Demonstrates promising results on CIFAR10 and MNIST datasets.
Abstract
Generating images from semantic visual knowledge is a challenging task, that can be useful to condition the synthesis process in complex, subtle, and unambiguous ways, compared to alternatives such as class labels or text descriptions. Although generative methods conditioned by semantic representations exist, they do not provide a way to control the generation process aside from the specification of constraints between objects. As an example, the possibility to iteratively generate or modify images by manually adding specific items is a desired property that, to our knowledge, has not been fully investigated in the literature. In this work we propose a transformer-based approach conditioned by scene graphs that, conversely to recent transformer-based methods, also employs a decoder to autoregressively compose images, making the synthesis process more effective and controllable. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
