DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation
Jiawei Liu, Junqiao Li, Jiangfan Deng, Gen Li, Siyu Zhou, Zetao Fang, Shanshan Lao, Zengde Deng, Jianing Zhu, Tingting Ma, Jiayi Li, Yunqiu Wang, Qian He, Xinglong Wu

TL;DR
DreaMontage is a novel framework for one-shot video generation that produces seamless, expressive, and long-duration videos guided by arbitrary frames, overcoming previous limitations in visual coherence and cinematic quality.
Contribution
It introduces a comprehensive approach combining a lightweight conditioning mechanism, high-quality dataset, and a segment-wise autoregressive strategy for improved one-shot video synthesis.
Findings
Achieves visually striking, coherent one-shot videos.
Enhances success rate and usability with Tailored DPO scheme.
Enables long-duration video generation with memory-efficient inference.
Abstract
The "one-shot" technique represents a distinct and sophisticated aesthetic in filmmaking. However, its practical realization is often hindered by prohibitive costs and complex real-world constraints. Although emerging video generation models offer a virtual alternative, existing approaches typically rely on naive clip concatenation, which frequently fails to maintain visual smoothness and temporal coherence. In this paper, we introduce DreaMontage, a comprehensive framework designed for arbitrary frame-guided generation, capable of synthesizing seamless, expressive, and long-duration one-shot videos from diverse user-provided inputs. To achieve this, we address the challenge through three primary dimensions. (i) We integrate a lightweight intermediate-conditioning mechanism into the DiT architecture. By employing an Adaptive Tuning strategy that effectively leverages base training data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Human Motion and Animation
