Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives
Tom Monnier, Jake Austin, Angjoo Kanazawa, Alexei A. Efros, Mathieu, Aubry

TL;DR
This paper introduces a differentiable rendering-based method to decompose scenes into interpretable textured 3D primitives directly from images, enabling scene editing and physics simulations.
Contribution
It presents a novel approach that models primitives as textured superquadric meshes optimized via differentiable rendering, including transparency modeling, for scene decomposition from images.
Findings
Faithful reconstruction of input images using textured primitives
Accurate modeling of visible 3D points and amodal shape completion
Robust performance on diverse real-world scenes
Abstract
Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
MethodsFocus
