SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor scenes
Pulak Purkait, Christopher Zach, Ian Reid

TL;DR
This paper introduces SG-VAE, a scene grammar variational autoencoder that automatically learns to generate valid, coherent indoor scene layouts by encoding and decoding scene parse trees, useful for various vision tasks.
Contribution
It presents an automatic grammar extraction method and a grammar-based auto-encoder for generating realistic indoor scenes with valid object arrangements.
Findings
The model generates valid and coherent indoor scene layouts.
It learns meaningful latent representations for scene synthesis.
Applicable to 3D pose and layout estimation tasks.
Abstract
Deep generative models have been used in recent years to learn coherent latent representations in order to synthesize high-quality images. In this work, we propose a neural network to learn a generative model for sampling consistent indoor scene layouts. Our method learns the co-occurrences, and appearance parameters such as shape and pose, for different objects categories through a grammar-based auto-encoder, resulting in a compact and accurate representation for scene layouts. In contrast to existing grammar-based methods with a user-specified grammar, we construct the grammar automatically by extracting a set of production rules on reasoning about object co-occurrences in training data. The extracted grammar is able to represent a scene by an augmented parse tree. The proposed auto-encoder encodes these parse trees to a latent code, and decodes the latent code to a parse tree,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
