Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
Jimei Yang, Scott Reed, Ming-Hsuan Yang, Honglak Lee

TL;DR
This paper introduces a recurrent neural network that synthesizes novel 3D views from a single image, effectively disentangling factors like identity and pose for specific object categories.
Contribution
It presents a novel recurrent convolutional encoder-decoder architecture trained end-to-end for 3D view synthesis from a single image, capable of disentangling latent factors without full supervision.
Findings
High-quality view synthesis for faces and chairs.
Effective disentangling of identity and pose factors.
Demonstrated on Multi-PIE and 3D chair datasets.
Abstract
An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
