Unsupervised Learning of Compositional Scene Representations from   Multiple Unspecified Viewpoints

Jinyang Yuan; Bin Li; Xiangyang Xue

arXiv:2112.03568·cs.CV·December 14, 2021

Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints

Jinyang Yuan, Bin Li, Xiangyang Xue

PDF

Open Access 1 Video

TL;DR

This paper introduces a deep generative model that learns to represent complex scenes compositionally from multiple, unspecified viewpoints without supervision, mimicking human perception of object constancy across viewpoints.

Contribution

It proposes a novel unsupervised model that separates viewpoint-independent and viewpoint-dependent features, enabling effective learning from multiple viewpoints.

Findings

01

Successfully learns scene representations from synthetic datasets

02

Achieves object constancy across different viewpoints

03

Demonstrates effectiveness of the proposed model in unsupervised settings

Abstract

Visual scenes are extremely rich in diversity, not only because there are infinite combinations of objects and background, but also because the observations of the same scene may vary greatly with the change of viewpoints. When observing a visual scene that contains multiple objects from multiple viewpoints, humans are able to perceive the scene in a compositional way from each viewpoint, while achieving the so-called "object constancy" across different viewpoints, even though the exact viewpoints are untold. This ability is essential for humans to identify the same object while moving and to learn from vision efficiently. It is intriguing to design models that have the similar ability. In this paper, we consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision, and propose a deep generative model which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints· underline

Taxonomy

Topics3D Surveying and Cultural Heritage · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging