TL;DR
SynSin introduces an end-to-end model for single-image view synthesis that leverages a differentiable point cloud renderer, enabling high-resolution, realistic view generation from real images without ground-truth 3D data.
Contribution
The paper presents a novel differentiable point cloud renderer and an end-to-end training approach for view synthesis from a single image without requiring ground-truth 3D information.
Findings
Outperforms prior methods on Matterport, Replica, and RealEstate10K datasets.
Generates high-resolution, realistic images from a single input.
Allows interpretable manipulation of the 3D latent space.
Abstract
Single image view synthesis allows for the generation of new views of a scene given a single input image. This is challenging, as it requires comprehensively understanding the 3D scene from a single image. As a result, current methods typically use multiple images, train on ground-truth depth, or are limited to synthetic data. We propose a novel end-to-end model for this task; it is trained on real images without any ground-truth 3D information. To this end, we introduce a novel differentiable point cloud renderer that is used to transform a latent 3D point cloud of features into the target view. The projected features are decoded by our refinement network to inpaint missing regions and generate a realistic output image. The 3D component inside of our generative model allows for interpretable manipulation of the latent feature space at test time, e.g. we can animate trajectories from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
SynSin: End-to-End View Synthesis From a Single Image· youtube
Taxonomy
MethodsTest
