ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Yawen Luo, Xiaoyu Shi, Junhao Zhuang, Yutian Chen, Quande Liu, Xintao Wang, Pengfei Wan, and Tianfan Xue

TL;DR
ShotStream introduces a causal, interactive multi-shot video generation architecture that enables real-time storytelling with high coherence and low latency, advancing the capabilities of narrative video synthesis.
Contribution
The paper presents a novel causal multi-shot architecture with dual-cache memory and a two-stage distillation process for interactive, real-time video storytelling.
Findings
Generates coherent multi-shot videos at 16 FPS on a single GPU.
Achieves inter-shot consistency and reduces error accumulation.
Matches or exceeds the quality of slower bidirectional models.
Abstract
Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. By reformulating the task as next-shot generation conditioned on historical context, ShotStream allows users to dynamically instruct ongoing narratives via streaming prompts. We achieve this by first fine-tuning a text-to-video model into a bidirectional next-shot generator, which is then distilled into a causal student via Distribution Matching Distillation. To overcome the challenges of inter-shot consistency and error accumulation inherent in autoregressive generation, we introduce two key innovations. First, a dual-cache memory mechanism preserves visual coherence: a global context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Artificial Intelligence in Games
