Perspective Transformer Nets: Learning Single-View 3D Object   Reconstruction without 3D Supervision

Xinchen Yan; Jimei Yang; Ersin Yumer; Yijie Guo; Honglak Lee

arXiv:1612.00814·cs.CV·August 15, 2017·316 cites

Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee

PDF

Open Access 2 Repos

TL;DR

This paper introduces Perspective Transformer Nets, a novel approach for single-view 3D object reconstruction that learns without explicit 3D supervision by leveraging a perspective-based projection loss.

Contribution

It proposes a new encoder-decoder network with a perspective transformation-based projection loss enabling unsupervised 3D reconstruction from 2D images.

Findings

01

The model successfully reconstructs 3D volumes from single 2D images.

02

It generalizes well across multiple object classes.

03

Projection loss improves reconstruction accuracy and generalization.

Abstract

Understanding the 3D world is a fundamental problem in computer vision. However, learning a good representation of 3D objects is still an open problem due to the high dimensionality of the data and many factors of variation involved. In this work, we investigate the task of single-view 3D object reconstruction from a learning agent's perspective. We formulate the learning process as an interaction between 3D and 2D representations and propose an encoder-decoder network with a novel projection loss defined by the perspective transformation. More importantly, the projection loss enables the unsupervised learning using 2D observation without explicit 3D supervision. We demonstrate the ability of the model in generating 3D volume from a single 2D image with three sets of experiments: (1) learning from single-class objects; (2) learning from multi-class objects and (3) testing on novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization