TL;DR
CausalCine is a novel autoregressive framework for real-time, multi-shot video narrative generation that maintains coherence across shots and supports interactive editing.
Contribution
It introduces a causal autoregressive model with Content-Aware Memory Routing and distillation for real-time, multi-shot video synthesis with dynamic prompts.
Findings
Outperforms autoregressive baselines in coherence and quality.
Approaches bidirectional models in capability.
Enables streaming, interactive video generation.
Abstract
Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trained primarily for short-horizon continuation, they treat long sequences as extended single shots, inevitably suffering from motion stagnation and semantic drift during long rollouts. To bridge this gap, we introduce CausalCine, an interactive autoregressive framework that transforms multi-shot video generation into an online directing process. CausalCine generates causally across shot changes, accepts dynamic prompts on the fly, and reuses context without regenerating previous shots. To achieve this, we first train a causal base model on native multi-shot sequences to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
