Investigating Object Compositionality in Generative Adversarial Networks

Sjoerd van Steenkiste; Karol Kurach; J\"urgen Schmidhuber; Sylvain; Gelly

arXiv:1810.10340·cs.CV·July 27, 2020

Investigating Object Compositionality in Generative Adversarial Networks

Sjoerd van Steenkiste, Karol Kurach, J\"urgen Schmidhuber, Sylvain, Gelly

PDF

TL;DR

This paper introduces a structured approach to GANs that incorporates object compositionality, leading to improved multi-object image generation and enabling unsupervised instance segmentation, advancing the understanding of structured generative models.

Contribution

It presents a minimal modification to standard GANs to include object compositionality as an inductive bias, enhancing multi-object image synthesis and enabling unsupervised segmentation.

Findings

01

Structured GANs generate more faithful multi-object images.

02

The approach improves unsupervised instance segmentation on CLEVR.

03

Incorporating structure benefits representation learning.

Abstract

Deep generative models seek to recover the process with which the observed data was generated. They may be used to synthesize new samples or to subsequently extract representations. Successful approaches in the domain of images are driven by several core inductive biases. However, a bias to account for the compositional way in which humans structure a visual scene in terms of objects has frequently been overlooked. In this work, we investigate object compositionality as an inductive bias for Generative Adversarial Networks (GANs). We present a minimal modification of a standard generator to incorporate this inductive bias and find that it reliably learns to generate images as compositions of objects. Using this general design as a backbone, we then propose two useful extensions to incorporate dependencies among objects and background. We extensively evaluate our approach on several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.