TL;DR
This paper introduces a novel method for synthesizing views of dynamic scenes from monocular videos by jointly modeling static and dynamic components with neural implicit representations, enabling flexible view generation.
Contribution
It proposes a joint training framework for static and dynamic NeRFs with regularization to handle the ill-posed nature of single-video learning, advancing dynamic scene view synthesis.
Findings
Achieves high-quality dynamic view synthesis from casual monocular videos.
Demonstrates the effectiveness of regularization in resolving ambiguities.
Provides extensive quantitative and qualitative results.
Abstract
We present an algorithm for generating novel views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene. Our work builds upon recent advances in neural implicit representation and uses continuous and differentiable functions for modeling the time-varying structure and the appearance of the scene. We jointly train a time-invariant static NeRF and a time-varying dynamic NeRF, and learn how to blend the results in an unsupervised manner. However, learning this implicit function from a single video is highly ill-posed (with infinitely many solutions that match the input video). To resolve the ambiguity, we introduce regularization losses to encourage a more physically plausible solution. We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
