Multi-Object Representation Learning with Iterative Variational Inference
Klaus Greff, Rapha\"el Lopez Kaufman, Rishabh Kabra, Nick Watters,, Chris Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander, Lerchner

TL;DR
This paper introduces a novel unsupervised method for joint object segmentation and representation learning using iterative variational inference, enabling generalization to more objects, occlusion inpainting, and sequence modeling.
Contribution
It presents a new unsupervised approach that learns to segment and represent multiple objects jointly, with iterative inference allowing for multi-modal posteriors and sequence extension.
Findings
Successfully segments scenes into interpretable objects without supervision
Inpaints occluded parts and generalizes to scenes with more objects
Handles ambiguous inputs with multi-modal posteriors and extends to sequences
Abstract
Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns -- without supervision -- to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
