ObPose: Leveraging Pose for Object-Centric Scene Inference and Generation in 3D
Yizhe Wu, Oiwi Parker Jones, Ingmar Posner

TL;DR
ObPose is an unsupervised 3D scene inference and generation model that learns object-centric representations by disentangling object location and appearance, leveraging pose as an inductive bias, and modeling scenes as compositions of NeRFs.
Contribution
ObPose introduces a novel unsupervised approach that uses pose as an inductive bias and voxelised NeRF approximations for object-centric 3D scene inference and generation.
Findings
Outperforms state-of-the-art in 3D scene inference on multiple datasets
Enables flexible scene editing and novel scene generation
Validates key design choices through ablation studies
Abstract
We present ObPose, an unsupervised object-centric inference and generation model which learns 3D-structured latent representations from RGB-D scenes. Inspired by prior art in 2D representation learning, ObPose considers a factorised latent space, separately encoding object location (where) and appearance (what). ObPose further leverages an object's pose (i.e. location and orientation), defined via a minimum volume principle, as a novel inductive bias for learning the where component. To achieve this, we propose an efficient, voxelised approximation approach to recover the object shape directly from a neural radiance field (NeRF). As a consequence, ObPose models each scene as a composition of NeRFs, richly representing individual objects. To evaluate the quality of the learned representations, ObPose is evaluated quantitatively on the YCB, MultiShapeNet, and CLEVR datatasets for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Robotics and Sensor-Based Localization
