Imagining from Images with an AI Storytelling Tool
Edirlei Soares de Lima, Marco A. Casanova, Antonio L. Furtado

TL;DR
This paper introduces ImageTeller, an AI storytelling tool that generates narratives from images using multimodal models like GPT-4o and Stable Diffusion XL, supporting user interaction and genre customization.
Contribution
The paper presents a novel multimodal storytelling system combining GPT-4o and Stable Diffusion XL, with an interactive interface for user-guided narrative generation from images.
Findings
Effective generation of stories from images demonstrated
User interaction enhances narrative customization
Supports multiple genres and user inputs
Abstract
A method for generating narratives by analyzing single images or image sequences is presented, inspired by the time immemorial tradition of Narrative Art. The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories, which are illustrated by a Stable Diffusion XL model. The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input. Users can guide the narrative's development according to the conventions of fundamental genres - such as Comedy, Romance, Tragedy, Satire or Mystery -, opt to generate data-driven stories, or to leave the prototype free to decide how to handle the narrative structure. User interaction is provided along the generation process, allowing the user to request alternative chapters or illustrations, and even reject and restart the story…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Artificial Intelligence Applications
MethodsOPT · Diffusion
