Imagining from Images with an AI Storytelling Tool

Edirlei Soares de Lima; Marco A. Casanova; Antonio L. Furtado

arXiv:2408.11517·cs.CL·August 22, 2024·2 cites

Imagining from Images with an AI Storytelling Tool

Edirlei Soares de Lima, Marco A. Casanova, Antonio L. Furtado

PDF

Open Access

TL;DR

This paper introduces ImageTeller, an AI storytelling tool that generates narratives from images using multimodal models like GPT-4o and Stable Diffusion XL, supporting user interaction and genre customization.

Contribution

The paper presents a novel multimodal storytelling system combining GPT-4o and Stable Diffusion XL, with an interactive interface for user-guided narrative generation from images.

Findings

01

Effective generation of stories from images demonstrated

02

User interaction enhances narrative customization

03

Supports multiple genres and user inputs

Abstract

A method for generating narratives by analyzing single images or image sequences is presented, inspired by the time immemorial tradition of Narrative Art. The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories, which are illustrated by a Stable Diffusion XL model. The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input. Users can guide the narrative's development according to the conventions of fundamental genres - such as Comedy, Romance, Tragedy, Satire or Mystery -, opt to generate data-driven stories, or to leave the prototype free to decide how to handle the narrative structure. User interaction is provided along the generation process, allowing the user to request alternative chapters or illustrations, and even reject and restart the story…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Artificial Intelligence Applications

MethodsOPT · Diffusion