Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs

Saeed Ghorbani

arXiv:2507.21893·cs.CV·August 6, 2025

Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs

Saeed Ghorbani

PDF

TL;DR

Aether Weaver is an integrated multimodal narrative co-generation system that simultaneously creates text, visuals, scene graphs, and soundscapes, ensuring consistency and emotional resonance for immersive storytelling.

Contribution

It introduces a novel framework that concurrently generates multimodal narrative components with dynamic scene graphs and emotional coherence, surpassing traditional sequential pipelines.

Findings

01

Enhances narrative depth and visual fidelity.

02

Improves emotional resonance across modalities.

03

Outperforms baseline approaches in qualitative evaluations.

Abstract

We introduce Aether Weaver, a novel, integrated framework for multimodal narrative co-generation that overcomes limitations of sequential text-to-visual pipelines. Our system concurrently synthesizes textual narratives, dynamic scene graph representations, visual scenes, and affective soundscapes, driven by a tightly integrated, co-generation mechanism. At its core, the Narrator, a large language model, generates narrative text and multimodal prompts, while the Director acts as a dynamic scene graph manager, and analyzes the text to build and maintain a structured representation of the story's world, ensuring spatio-temporal and relational consistency for visual rendering and subsequent narrative generation. Additionally, a Narrative Arc Controller guides the high-level story structure, influencing multimodal affective consistency, further complemented by an Affective Tone Mapper that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.