Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu, Haoning Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi, Xie

TL;DR
This paper introduces StoryGen, a novel auto-regressive model for open-ended visual storytelling that generates coherent image sequences from storylines, supported by a large-scale dataset and validated through experiments and human evaluations.
Contribution
The paper presents a new model, a large dataset, and a pipeline for visual storytelling, advancing the coherence and diversity of generated image sequences.
Findings
StoryGen outperforms baselines in coherence and character consistency
The dataset StorySalon enables diverse storytelling scenarios
Model generalizes to unseen characters without additional optimization
Abstract
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently. In this work, we focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling. We make the following three contributions: (i) to fulfill the task of visual storytelling, we propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module, that enables to generate the current frame by conditioning on the corresponding text prompt and preceding image-caption pairs; (ii) to address the data shortage of visual storytelling, we collect paired image-text sequences by sourcing from online videos and open-source E-books, establishing processing pipeline for constructing a large-scale dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Digital Storytelling and Education · Video Analysis and Summarization
MethodsDiffusion · Focus
