TL;DR
This paper introduces a new framework for synthesizing novel views of 360° scenes from very few input images, combining pose estimation, synthetic view generation, and image enhancement to improve quality and coverage.
Contribution
The proposed method jointly estimates camera poses, generates synthetic views, and retrains a diffusion model, enabling better scene coverage and image quality in extremely sparse-view scenarios.
Findings
Significant improvement over benchmarks with only four input views.
Effective dense sampling from the upper hemisphere enhances scene coverage.
Enhanced image quality through diffusion-based artifact removal.
Abstract
Novel view synthesis in 360 scenes from extremely sparse input views is essential for applications like virtual reality and augmented reality. This paper presents a novel framework for novel view synthesis in extremely sparse-view cases. As typical structure-from-motion methods are unable to estimate camera poses in extremely sparse-view cases, we apply DUSt3R to estimate camera poses and generate a dense point cloud. Using the poses of estimated cameras, we densely sample additional views from the upper hemisphere space of the scenes, from which we render synthetic images together with the point cloud. Training 3D Gaussian Splatting model on a combination of reference images from sparse views and densely sampled synthetic images allows a larger scene coverage in 3D space, addressing the overfitting challenge due to the limited input in sparse-view cases. Retraining a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
