Cut2Next: Generating Next Shot via In-Context Tuning
Jingwen He, Hongbo Liu, Jiajun Li, Ziqi Huang, Yu Qiao, Wanli Ouyang, Ziwei Liu

TL;DR
Cut2Next introduces a diffusion transformer-based framework that generates next shots in film sequences by adhering to professional editing patterns and cinematic continuity, enhancing narrative flow and visual coherence.
Contribution
The paper presents a novel in-context tuning approach with hierarchical prompts and architectural innovations for cinematic shot generation, addressing limitations of existing methods.
Findings
Outperforms baselines in visual consistency and text fidelity
User studies favor Cut2Next for editing pattern adherence
Effective in maintaining cinematic continuity
Abstract
Effective multi-shot generation demands purposeful, film-like transitions and strict cinematic continuity. Current methods, however, often prioritize basic visual consistency, neglecting crucial editing patterns (e.g., shot/reverse shot, cutaways) that drive narrative flow for compelling storytelling. This yields outputs that may be visually coherent but lack narrative sophistication and true cinematic integrity. To bridge this, we introduce Next Shot Generation (NSG): synthesizing a subsequent, high-quality shot that critically conforms to professional editing patterns while upholding rigorous cinematic continuity. Our framework, Cut2Next, leverages a Diffusion Transformer (DiT). It employs in-context tuning guided by a novel Hierarchical Multi-Prompting strategy. This strategy uses Relational Prompts to define overall context and inter-shot editing styles. Individual Prompts then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
