S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix
Peng Dai, Feitong Tan, Qiangeng Xu, Yihua Huang, David Futschik, Ruofei Du, Sean Fanello, Yinda Zhang, Xiaojuan Qi

TL;DR
This paper introduces a pose-free, training-free method for generating 3D stereoscopic and spatial videos from monocular videos using a novel frame matrix inpainting framework and a dual update scheme, enhancing immersive video synthesis.
Contribution
The authors propose a new approach that leverages existing monocular video models to produce 3D videos without additional training or fine-tuning, using depth-based warping and inpainting techniques.
Findings
Significant improvement over previous methods in 3D video generation quality.
Effective multi-view video synthesis validated on various generative models.
Enhanced spatial and temporal consistency in generated videos.
Abstract
While video generation models excel at producing high-quality monocular videos, generating 3D stereoscopic and spatial videos for immersive applications remains an underexplored challenge. We present a pose-free and training-free method that leverages an off-the-shelf monocular video generation model to produce immersive 3D videos. Our approach first warps the generated monocular video into pre-defined camera viewpoints using estimated depth information, then applies a novel \textit{frame matrix} inpainting framework. This framework utilizes the original video generation model to synthesize missing content across different viewpoints and timestamps, ensuring spatial and temporal consistency without requiring additional model fine-tuning. Moreover, we develop a \dualupdate~scheme that further improves the quality of video inpainting by alleviating the negative effects propagated from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Vision and Imaging
