SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks
Zoe Landgraf, Raluca Scona, Tristan Laidlow, Stephen James, Stefan, Leutenegger, Andrew J. Davison

TL;DR
SIMstack is a novel generative model that predicts 3D shapes and instance segmentation of object stacks from a single depth view by leveraging physics-based priors in a learned latent space.
Contribution
It introduces a depth-conditioned VAE trained on physics-simulated scenes to improve shape and instance predictions in occluded regions.
Findings
Effective in predicting 3D shapes from partial views.
Enables class-agnostic instance segmentation without predefined object count.
Facilitates robotic grasping of unknown objects from single depth images.
Abstract
By estimating 3D shape and instances from a single view, we can capture information about an environment quickly, without the need for comprehensive scanning and multi-view fusion. Solving this task for composite scenes (such as object stacks) is challenging: occluded areas are not only ambiguous in shape but also in instance segmentation; multiple decompositions could be valid. We observe that physics constrains decomposition as well as shape in occluded regions and hypothesise that a latent space learned from scenes built under physics simulation can serve as a prior to better predict shape and instances in occluded regions. To this end we propose SIMstack, a depth-conditioned Variational Auto-Encoder (VAE), trained on a dataset of objects stacked under physics simulation. We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn't…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
