ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Xiaoxue Wu, Xinyuan Chen, Yaohui Wang, Yu Qiao

TL;DR
ShotDirector is a framework for controllable multi-shot video generation that integrates camera parameters and hierarchical editing patterns to produce film-like transitions, addressing limitations of previous methods.
Contribution
We introduce ShotDirector, a novel framework combining parameter-level camera control with hierarchical editing-pattern prompts for realistic, controllable multi-shot video synthesis.
Findings
Effective control over shot transitions demonstrated
Constructed ShotWeaver40K dataset for training and evaluation
Outperforms existing methods in producing film-like editing patterns
Abstract
Shot transitions play a pivotal role in multi-shot video generation, as they determine the overall narrative expression and the directorial design of visual storytelling. However, recent progress has primarily focused on low-level visual consistency across shots, neglecting how transitions are designed and how cinematographic language contributes to coherent narrative expression. This often leads to mere sequential shot changes without intentional film-editing patterns. To address this limitation, we propose ShotDirector, an efficient framework that integrates parameter-level camera control and hierarchical editing-pattern-aware prompting. Specifically, we adopt a camera control module that incorporates 6-DoF poses and intrinsic settings to enable precise camera information injection. In addition, a shot-aware mask mechanism is employed to introduce hierarchical prompts aware of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Multimodal Machine Learning Applications
