TL;DR
AuDirector is a novel self-reflective framework for immersive audio storytelling that enhances coherence, expressiveness, and user interactivity through a multi-agent closed-loop system.
Contribution
It introduces a self-reflective, multi-agent framework with novel modules for character-aware synthesis, self-correction, and human-guided refinement in audio storytelling.
Findings
Achieves superior coherence, expressiveness, and fidelity over baselines.
Effectively integrates natural language feedback for interactive refinement.
Demonstrates improved audio quality through systematic self-correction.
Abstract
Despite advances in text and visual generation, creating coherent long-form audio narratives remains challenging. Existing frameworks often exhibit limitations such as mismatched character settings with voice performance, insufficient self-correction mechanisms, and limited human interactivity. To address these challenges, we propose AuDirector, a self-reflective closed-loop multi-agent framework. Specifically, it involves an Identity-Aware Pre-production mechanism that transforms narrative texts into character profiles and utterance-level emotional instructions to retrieve suitable voice candidates and guide expressive speech synthesis, thereby promoting context-aligned voice adaptation. To enhance quality, a Collaborative Synthesis and Correction module introduces a closed-loop self-correction mechanism to systematically audit and regenerate defective audio components. Furthermore, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
