DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior
Junjia Huang, Binbin Yang, Pengxiang Yan, Jiyang Liu, Bin Xia, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li

TL;DR
DreamShot is a novel video diffusion-based storyboard synthesis framework that generates coherent, narrative-driven shot sequences with consistent characters and scenes, supporting flexible text and reference inputs.
Contribution
It introduces a controllable multi-shot storyboard generation method leveraging video diffusion priors and a role-conditioning module for character identity consistency.
Findings
Outperforms state-of-the-art models in scene coherence and role consistency.
Supports both text-to-shot and reference-to-shot generation.
Produces visually and semantically coherent story sequences.
Abstract
Storyboard synthesis plays a crucial role in visual storytelling, aiming to generate coherent shot sequences that visually narrate cinematic events with consistent characters, scenes, and transitions. However, existing approaches are mostly adapted from text-to-image diffusion models, which struggle to maintain long-range temporal coherence, consistent character identities, and narrative flow across multiple shots. In this paper, we introduce DreamShot, a video generative model based storyboard framework that fully exploits powerful video diffusion priors for controllable multi-shot synthesis. DreamShot supports both Text-to-Shot and Reference-to-Shot generation, as well as story continuation conditioned on previous frames, enabling flexible and context-aware storyboard generation. By leveraging the spatial-temporal consistency inherent in video generative models, DreamShot produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
