SceneScape: Text-Driven Consistent Scene Generation
Rafail Fridman, Amit Abecasis, Yoni Kasten, Tali Dekel

TL;DR
SceneScape introduces a novel text-driven method for generating long-term, 3D-consistent videos of diverse scenes by combining pre-trained models with online training to ensure geometric plausibility.
Contribution
The paper presents a new framework that synthesizes consistent scene videos from text prompts using depth priors and online training, enabling diverse scene generation beyond limited domains.
Findings
Generates long-term videos with 3D consistency.
Produces diverse scenes like spaceships, caves, and ice castles.
Uses online test-time training for geometric consistency.
Abstract
We present a method for text-driven perpetual view generation -- synthesizing long-term videos of various scenes solely, given an input text prompt describing the scene and camera poses. We introduce a novel framework that generates such videos in an online fashion by combining the generative power of a pre-trained text-to-image model with the geometric priors learned by a pre-trained monocular depth prediction model. To tackle the pivotal challenge of achieving 3D consistency, i.e., synthesizing videos that depict geometrically-plausible scenes, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene. The depth maps are used to construct a unified mesh representation of the scene, which is progressively constructed along the video generation process. In contrast to previous works, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis
