Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

Shubham Tulsiani; Saurabh Gupta; David Fouhey; Alexei A. Efros,; Jitendra Malik

arXiv:1712.01812·cs.CV·April 25, 2018

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros,, Jitendra Malik

PDF

TL;DR

This paper presents a CNN-based method to extract 3D scene structure, including layout and object pose, from a single 2D image, advancing scene understanding in indoor environments.

Contribution

It introduces a novel approach for factorizing 3D scene elements from 2D images and provides extensive benchmarking on indoor scene datasets.

Findings

01

Successful inference of 3D layout and object pose from 2D images

02

Quantitative and qualitative validation of the proposed representation

03

Insights into practical design choices for scene factorization

Abstract

The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose. We propose a convolutional neural network-based approach to predict this representation and benchmark it on a large dataset of indoor scenes. Our experiments evaluate a number of practical design questions, demonstrate that we can infer this representation, and quantitatively and qualitatively demonstrate its merits compared to alternate representations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.