Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos
Ruoyu Wang, Yi Ma, Shenghua Gao

TL;DR
This paper introduces a two-stage self-supervised approach for novel view synthesis from uncalibrated videos, combining implicit scene learning with explicit 3D primitive prediction to achieve high-quality results without prior geometric information.
Contribution
It proposes a novel two-stage training strategy that enables view synthesis from raw uncalibrated videos without geometric priors, bridging the gap between implicit and explicit 3D representations.
Findings
Achieves high-quality novel view synthesis without camera calibration.
Provides accurate camera pose estimation from uncalibrated videos.
Demonstrates effectiveness on large-scale uncalibrated video datasets.
Abstract
Currently almost all state-of-the-art novel view synthesis and reconstruction models rely on calibrated cameras or additional geometric priors for training. These prerequisites significantly limit their applicability to massive uncalibrated data. To alleviate this requirement and unlock the potential for self-supervised training on large-scale uncalibrated videos, we propose a novel two-stage strategy to train a view synthesis model from only raw video frames or multi-view images, without providing camera parameters or other priors. In the first stage, we learn to reconstruct the scene implicitly in a latent space without relying on any explicit 3D representation. Specifically, we predict per-frame latent camera and scene context features, and employ a view synthesis model as a proxy for explicit rendering. This pretraining stage substantially reduces the optimization complexity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
MethodsALIGN
