Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Video

Alexander Htet Kyaw; Lenin Ravindranath Sivalingam

arXiv:2511.03227·cs.HC·November 7, 2025

Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Video

Alexander Htet Kyaw, Lenin Ravindranath Sivalingam

PDF

Open Access

TL;DR

This paper introduces a node-based storytelling system enabling multimodal content creation through graph-based story representation, supporting targeted editing, iterative refinement, and integration of text, images, audio, and video.

Contribution

The work presents a novel node-based interface for multimodal storytelling that allows flexible editing and generation of narratives across multiple media types.

Findings

01

Supports control over narrative structure.

02

Enables iterative generation of multimodal content.

03

Demonstrates effectiveness in story outline generation.

Abstract

We present a node-based storytelling system for multimodal content generation. The system represents stories as graphs of nodes that can be expanded, edited, and iteratively refined through direct user edits and natural-language prompts. Each node can integrate text, images, audio, and video, allowing creators to compose multimodal narratives. A task selection agent routes between specialized generative tasks that handle story generation, node structure reasoning, node diagram formatting, and context generation. The interface supports targeted editing of individual nodes, automatic branching for parallel storylines, and node-based iterative refinement. Our results demonstrate that node-based editing supports control over narrative structure and iterative generation of text, images, audio, and video. We report quantitative outcomes on automatic story outline generation and qualitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Games · Digital Humanities and Scholarship