MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Qinghe Wang; Xiaoyu Shi; Baolu Li; Weikang Bian; Quande Liu; Huchuan Lu; Xintao Wang; Pengfei Wan; Kun Gai; Xu Jia

arXiv:2512.03041·cs.CV·December 3, 2025

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Qinghe Wang, Xiaoyu Shi, Baolu Li, Weikang Bian, Quande Liu, Huchuan Lu, Xintao Wang, Pengfei Wan, Kun Gai, Xu Jia

PDF

Open Access 1 Models

TL;DR

MultiShotMaster introduces a controllable framework for multi-shot video generation that extends pretrained models with novel positional encoding variants, enabling flexible shot arrangement, narrative coherence, and scene customization.

Contribution

It presents new RoPE variants for shot transition control and spatiotemporal grounding, along with an automated data annotation pipeline for multi-shot videos.

Findings

01

Demonstrates superior controllability and performance in multi-shot video generation.

02

Enables flexible shot count and duration customization.

03

Supports text-driven inter-shot consistency and scene customization.

Abstract

Current video generation techniques excel at single-shot clips but struggle to produce narrative multi-shot videos, which require flexible shot arrangement, coherent narrative, and controllability beyond text prompts. To tackle these challenges, we propose MultiShotMaster, a framework for highly controllable multi-shot video generation. We extend a pretrained single-shot model by integrating two novel variants of RoPE. First, we introduce Multi-Shot Narrative RoPE, which applies explicit phase shift at shot transitions, enabling flexible shot arrangement while preserving the temporal narrative order. Second, we design Spatiotemporal Position-Aware RoPE to incorporate reference tokens and grounding signals, enabling spatiotemporal-grounded reference injection. In addition, to overcome data scarcity, we establish an automated data annotation pipeline to extract multi-shot videos,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
KlingTeam/MultiShotMaster
model· ♡ 12
♡ 12

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications