SPACE: Unsupervised Object-Oriented Scene Representation via Spatial   Attention and Decomposition

Zhixuan Lin; Yi-Fu Wu; Skand Vishwanath Peri; Weihao Sun; Gautam; Singh; Fei Deng; Jindong Jiang; Sungjin Ahn

arXiv:2001.02407·cs.LG·March 17, 2020·44 cites

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam, Singh, Fei Deng, Jindong Jiang, Sungjin Ahn

PDF

Open Access 4 Repos

TL;DR

SPACE is a generative model that unifies spatial attention and scene mixture techniques to efficiently decompose complex scenes into object and background representations, scalable to many objects.

Contribution

It introduces a probabilistic framework that combines the strengths of previous methods, enabling scalable, detailed scene decomposition with explicit object and background representations.

Findings

01

Outperforms previous models like SPAIR, IODINE, and GENESIS.

02

Effectively decomposes scenes with many objects without performance loss.

03

Demonstrates applicability on Atari and 3D-Rooms datasets.

Abstract

The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either based on spatial-attention or scene-mixture approaches and limited in scalability which is a main obstacle towards modeling real-world scenes. In this paper, we propose a generative latent variable model, called SPACE, that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches. SPACE can explicitly provide factorized object representations for foreground objects while also decomposing background segments of complex morphology. Previous models are good at either of these, but not both. SPACE also resolves the scalability problems of previous methods by incorporating parallel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging