DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki

TL;DR
DreamScene4D introduces a novel method for generating 3D dynamic multi-object scenes from monocular videos by decomposing scenes into background and objects, enabling accurate 4D scene synthesis and object tracking.
Contribution
It is the first approach to generate 3D multi-object scenes from monocular videos using a decomposition-recomposition strategy for scene and motion modeling.
Findings
Effective 4D scene generation on challenging datasets
Accurate 2D persistent point tracking from 3D trajectories
Superior performance in quantitative and user preference evaluations
Abstract
View-predictive generative models provide strong priors for lifting object-centric images and videos into 3D and 4D through rendering and score distillation objectives. A question then remains: what about lifting complete multi-object dynamic scenes? There are two challenges in this direction: First, rendering error gradients are often insufficient to recover fast object motion, and second, view predictive generative models work much better for objects than whole scenes, so, score distillation objectives cannot currently be applied at the scene level directly. We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via 360-degree novel view synthesis. Our key insight is a "decompose-recompose" approach that factorizes the video scene into the background and object tracks, while also factorizing object motion into 3 components:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
MethodsDiffusion
