Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning
Zhengjian Yao, Yongzhi Li, Xinyuan Gao, Quan Chen, Peng Jiang, Yanye Lu

TL;DR
Narrative Weaver is a comprehensive framework that enables controllable, long-range, and coherent visual content generation across multiple modalities, addressing a key limitation of existing models in narrative consistency and visual fidelity.
Contribution
It introduces a novel architecture combining multimodal planning, a dynamic memory bank, and a multi-stage training strategy, along with the first dataset for evaluating long-range visual storytelling.
Findings
Achieves state-of-the-art performance in multi-scene visual generation
Demonstrates effective long-range coherence in narrative visual content
Provides a new dataset for benchmarking long-range visual storytelling
Abstract
We present "Narrative Weaver", a novel framework that addresses a fundamental challenge in generative AI: achieving multi-modal controllable, long-range, and consistent visual content generation. While existing models excel at generating high-fidelity short-form visual content, they struggle to maintain narrative coherence and visual consistency across extended sequences - a critical limitation for real-world applications such as filmmaking and e-commerce advertising. Narrative Weaver introduces the first holistic solution that seamlessly integrates three essential capabilities: fine-grained control, automatic narrative planning, and long-range coherence. Our architecture combines a Multimodal Large Language Model (MLLM) for high-level narrative planning with a novel fine-grained control module featuring a dynamic Memory Bank that prevents visual drift. To enable practical deployment,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games
