Multi-Object Representation Learning with Iterative Variational   Inference

Klaus Greff; Rapha\"el Lopez Kaufman; Rishabh Kabra; Nick Watters,; Chris Burgess; Daniel Zoran; Loic Matthey; Matthew Botvinick; Alexander; Lerchner

arXiv:1903.00450·cs.LG·July 29, 2020·176 cites

Multi-Object Representation Learning with Iterative Variational Inference

Klaus Greff, Rapha\"el Lopez Kaufman, Rishabh Kabra, Nick Watters,, Chris Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander, Lerchner

PDF

Open Access 5 Repos

TL;DR

This paper introduces a novel unsupervised method for joint object segmentation and representation learning using iterative variational inference, enabling generalization to more objects, occlusion inpainting, and sequence modeling.

Contribution

It presents a new unsupervised approach that learns to segment and represent multiple objects jointly, with iterative inference allowing for multi-modal posteriors and sequence extension.

Findings

01

Successfully segments scenes into interpretable objects without supervision

02

Inpaints occluded parts and generalizes to scenes with more objects

03

Handles ambiguous inputs with multi-modal posteriors and extends to sequences

Abstract

Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns -- without supervision -- to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques