SIMstack: A Generative Shape and Instance Model for Unordered Object   Stacks

Zoe Landgraf; Raluca Scona; Tristan Laidlow; Stephen James; Stefan; Leutenegger; Andrew J. Davison

arXiv:2103.16442·cs.CV·September 28, 2021

SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks

Zoe Landgraf, Raluca Scona, Tristan Laidlow, Stephen James, Stefan, Leutenegger, Andrew J. Davison

PDF

Open Access

TL;DR

SIMstack is a novel generative model that predicts 3D shapes and instance segmentation of object stacks from a single depth view by leveraging physics-based priors in a learned latent space.

Contribution

It introduces a depth-conditioned VAE trained on physics-simulated scenes to improve shape and instance predictions in occluded regions.

Findings

01

Effective in predicting 3D shapes from partial views.

02

Enables class-agnostic instance segmentation without predefined object count.

03

Facilitates robotic grasping of unknown objects from single depth images.

Abstract

By estimating 3D shape and instances from a single view, we can capture information about an environment quickly, without the need for comprehensive scanning and multi-view fusion. Solving this task for composite scenes (such as object stacks) is challenging: occluded areas are not only ambiguous in shape but also in instance segmentation; multiple decompositions could be valid. We observe that physics constrains decomposition as well as shape in occluded regions and hypothesise that a latent space learned from scenes built under physics simulation can serve as a prior to better predict shape and instances in occluded regions. To this end we propose SIMstack, a depth-conditioned Variational Auto-Encoder (VAE), trained on a dataset of objects stacked under physics simulation. We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn't…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Robotics and Sensor-Based Localization