DreamJourney: Perpetual View Generation with Video Diffusion Models
Bo Pan, Yang Chen, Yingwei Pan, Ting Yao, Wei Chen, Tao Mei

TL;DR
DreamJourney is a novel two-stage framework that synthesizes long-term, dynamic 3D scene videos from a single image by combining 3D lifting, video diffusion models, and language-driven object animation for perpetual view generation.
Contribution
It introduces a two-stage process leveraging video diffusion and language models to generate dynamic, view-consistent videos with object movements from a single image, addressing limitations of static scene synthesis.
Findings
Outperforms state-of-the-art methods quantitatively.
Produces more coherent and dynamic scene videos.
Effectively captures object movements within 4D scenes.
Abstract
Perpetual view generation aims to synthesize a long-term video corresponding to an arbitrary camera trajectory solely from a single input image. Recent methods commonly utilize a pre-trained text-to-image diffusion model to synthesize new content of previously unseen regions along camera movement. However, the underlying 2D diffusion model lacks 3D awareness and results in distorted artifacts. Moreover, they are limited to generating views of static 3D scenes, neglecting to capture object movements within the dynamic 4D world. To alleviate these issues, we present DreamJourney, a two-stage framework that leverages the world simulation capacity of video diffusion models to trigger a new perpetual scene view generation task with both camera movements and object dynamics. Specifically, in stage I, DreamJourney first lifts the input image to 3D point cloud and renders a sequence of partial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
