Spatially Invariant Unsupervised 3D Object-Centric Learning and Scene   Decomposition

Tianyu Wang; Miaomiao Liu; Kee Siong Ng

arXiv:2106.05607·cs.CV·July 19, 2022

Spatially Invariant Unsupervised 3D Object-Centric Learning and Scene Decomposition

Tianyu Wang, Miaomiao Liu, Kee Siong Ng

PDF

Open Access

TL;DR

SPAIR3D is a novel framework for unsupervised object-centric learning on 3D point clouds, enabling scalable scene decomposition and object detection without supervision.

Contribution

We introduce SPAIR3D, a new method that models 3D scenes as spatial mixtures with a Chamfer Mixture Loss, allowing arbitrary object counts and unsupervised segmentation.

Findings

01

Effective unsupervised scene decomposition from point clouds.

02

Capable of detecting and segmenting an unknown number of objects.

03

Demonstrates strong scalability and accuracy.

Abstract

We tackle the problem of object-centric learning on point clouds, which is crucial for high-level relational reasoning and scalable machine intelligence. In particular, we introduce a framework, SPAIR3D, to factorize a 3D point cloud into a spatial mixture model where each component corresponds to one object. To model the spatial mixture model on point clouds, we derive the Chamfer Mixture Loss, which fits naturally into our variational training pipeline. Moreover, we adopt an object-specification scheme that describes each object's location relative to its local voxel grid cell. Such a scheme allows SPAIR3D to model scenes with an arbitrary number of objects. We evaluate our method on the task of unsupervised scene decomposition. Experimental results demonstrate that SPAIR3D has strong scalability and is capable of detecting and segmenting an unknown number of objects from a point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Human Pose and Action Recognition · Advanced Neural Network Applications