Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling
Min Zhang, Zilin Wang, Liyan Chen, Kunhong Liu, Juncong Lin

TL;DR
Dialogue Director is a novel multimodal framework that converts dialogue scripts into detailed, multi-view storyboards by leveraging large models and diffusion architectures, significantly improving storytelling visualization.
Contribution
It introduces Dialogue Director, a training-free multimodal system that enhances dialogue script visualization through advanced reasoning, retrieval, and multi-view synthesis techniques.
Findings
Outperforms existing methods in script interpretation
Improves physical context understanding in storyboards
Enhances cinematic principle application
Abstract
Recent advances in AI-driven storytelling have enhanced video generation and story visualization. However, translating dialogue-centric scripts into coherent storyboards remains a significant challenge due to limited script detail, inadequate physical context understanding, and the complexity of integrating cinematic principles. To address these challenges, we propose Dialogue Visualization, a novel task that transforms dialogue scripts into dynamic, multi-view storyboards. We introduce Dialogue Director, a training-free multimodal framework comprising a Script Director, Cinematographer, and Storyboard Maker. This framework leverages large multimodal models and diffusion-based architectures, employing techniques such as Chain-of-Thought reasoning, Retrieval-Augmented Generation, and multi-view synthesis to improve script understanding, physical context comprehension, and cinematic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Speech and dialogue systems
