Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models
Hyeonho Jeong, Gihyun Kwon, Jong Chul Ye

TL;DR
This paper introduces a novel neural pipeline that uses large language models and diffusion models to generate coherent storybooks from plain text, emphasizing zero-shot capabilities and semantic editing for improved storytelling coherence.
Contribution
It presents a zero-shot storybook generation method combining language and diffusion models with semantic editing, avoiding expensive training on image-caption pairs.
Findings
Outperforms state-of-the-art image editing baselines
Ensures coherence in generated story sequences
Uses simple textual inversion techniques for zero-shot generation
Abstract
Recent advancements in large scale text-to-image models have opened new possibilities for guiding the creation of images through human-devised natural language. However, while prior literature has primarily focused on the generation of individual images, it is essential to consider the capability of these models to ensure coherency within a sequence of images to fulfill the demands of real-world applications such as storytelling. To address this, here we present a novel neural pipeline for generating a coherent storybook from the plain text of a story. Specifically, we leverage a combination of a pre-trained Large Language Model and a text-guided Latent Diffusion Model to generate coherent images. While previous story synthesis frameworks typically require a large-scale text-to-image model trained on expensive image-caption pairs to maintain the coherency, we employ simple textual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques
MethodsLatent Diffusion Model · Diffusion
