MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
Canyu Zhao, Mingyu Liu, Wen Wang, Weihua Chen, Fan Wang, Hao Chen, Bo Zhang, Chunhua Shen

TL;DR
MovieDreamer introduces a hierarchical framework combining autoregressive models and diffusion rendering to generate long, coherent, and high-quality videos with complex narratives and character consistency, advancing long-form video synthesis.
Contribution
It pioneers a hierarchical approach integrating autoregressive and diffusion models for extended, coherent video generation with detailed narrative and character continuity.
Findings
Achieves longer video generation than previous methods.
Demonstrates superior visual and narrative quality across genres.
Effectively maintains character consistency over extended sequences.
Abstract
Recent advancements in video generation have primarily leveraged diffusion models for short-duration content. However, these approaches often fall short in modeling complex narratives and maintaining character consistency over extended periods, which is essential for long-form video production like movies. We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering to pioneer long-duration video generation with intricate plot progressions and high visual fidelity. Our approach utilizes autoregressive models for global narrative coherence, predicting sequences of visual tokens that are subsequently transformed into high-quality video frames through diffusion rendering. This method is akin to traditional movie production processes, where complex stories are factorized down into manageable scene capturing.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Computer Graphics and Visualization Techniques
MethodsDiffusion
