SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam, Singh, Fei Deng, Jindong Jiang, Sungjin Ahn

TL;DR
SPACE is a generative model that unifies spatial attention and scene mixture techniques to efficiently decompose complex scenes into object and background representations, scalable to many objects.
Contribution
It introduces a probabilistic framework that combines the strengths of previous methods, enabling scalable, detailed scene decomposition with explicit object and background representations.
Findings
Outperforms previous models like SPAIR, IODINE, and GENESIS.
Effectively decomposes scenes with many objects without performance loss.
Demonstrates applicability on Atari and 3D-Rooms datasets.
Abstract
The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either based on spatial-attention or scene-mixture approaches and limited in scalability which is a main obstacle towards modeling real-world scenes. In this paper, we propose a generative latent variable model, called SPACE, that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches. SPACE can explicitly provide factorized object representations for foreground objects while also decomposing background segments of complex morphology. Previous models are good at either of these, but not both. SPACE also resolves the scalability problems of previous methods by incorporating parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
