DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Jiawei Liu; Junqiao Li; Jiangfan Deng; Gen Li; Siyu Zhou; Zetao Fang; Shanshan Lao; Zengde Deng; Jianing Zhu; Tingting Ma; Jiayi Li; Yunqiu Wang; Qian He; Xinglong Wu

arXiv:2512.21252·cs.CV·December 29, 2025

DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Jiawei Liu, Junqiao Li, Jiangfan Deng, Gen Li, Siyu Zhou, Zetao Fang, Shanshan Lao, Zengde Deng, Jianing Zhu, Tingting Ma, Jiayi Li, Yunqiu Wang, Qian He, Xinglong Wu

PDF

Open Access

TL;DR

DreaMontage is a novel framework for one-shot video generation that produces seamless, expressive, and long-duration videos guided by arbitrary frames, overcoming previous limitations in visual coherence and cinematic quality.

Contribution

It introduces a comprehensive approach combining a lightweight conditioning mechanism, high-quality dataset, and a segment-wise autoregressive strategy for improved one-shot video synthesis.

Findings

01

Achieves visually striking, coherent one-shot videos.

02

Enhances success rate and usability with Tailored DPO scheme.

03

Enables long-duration video generation with memory-efficient inference.

Abstract

The "one-shot" technique represents a distinct and sophisticated aesthetic in filmmaking. However, its practical realization is often hindered by prohibitive costs and complex real-world constraints. Although emerging video generation models offer a virtual alternative, existing approaches typically rely on naive clip concatenation, which frequently fails to maintain visual smoothness and temporal coherence. In this paper, we introduce DreaMontage, a comprehensive framework designed for arbitrary frame-guided generation, capable of synthesizing seamless, expressive, and long-duration one-shot videos from diverse user-provided inputs. To achieve this, we address the challenge through three primary dimensions. (i) We integrate a lightweight intermediate-conditioning mechanism into the DiT architecture. By employing an Adaptive Tuning strategy that effectively leverages base training data,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Human Motion and Animation