GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
Martin Engelcke, Adam R. Kosiorek, Oiwi Parker Jones, Ingmar Posner

TL;DR
GENESIS is a novel object-centric generative model for 3D scenes that captures object relationships, enabling scene decomposition and principled sampling of new scenes, advancing visual scene understanding in robotics and reinforcement learning.
Contribution
It introduces GENESIS, the first model to explicitly incorporate object relationships in 3D scene generation and decomposition, improving over prior models like MONet and IODINE.
Findings
Effective scene decomposition demonstrated on multiple datasets.
Capable of generating realistic novel scenes with object interactions.
Improved semi-supervised learning performance.
Abstract
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning. Yet, even though tasks in these domains typically involve distinct objects, most state-of-the-art generative models do not explicitly capture the compositional nature of visual scenes. Two recent exceptions, MONet and IODINE, decompose scenes into objects in an unsupervised fashion. Their underlying generative processes, however, do not account for component interactions. Hence, neither of them allows for principled sampling of novel scenes. Here we present GENESIS, the first object-centric generative model of 3D visual scenes capable of both decomposing and generating scenes by capturing relationships between scene components. GENESIS parameterises a spatial GMM over images which is decoded from a set of object-centric latent variables that are either inferred sequentially in an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
MethodsMixture model network · Generalized ELBO with Constrained Optimization · Spatial Broadcast Decoder · Gated Linear Unit · Exponential Linear Unit · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Batch Normalization · Adam
