Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning
Rui Liu, Yuan Zhao, Zhenqi Jia

TL;DR
This paper introduces Authentic-Dubber, a novel approach to movie dubbing that simulates director-actor interactions using multimodal retrieval and progressive speech generation, leading to more emotionally authentic dubbing results.
Contribution
It proposes a new retrieve-augmented learning scheme with multimodal reference libraries, emotion-based retrieval, and progressive graph-based speech synthesis for authentic dubbing.
Findings
Improved emotional expressiveness in dubbing results.
Effective retrieval of relevant multimodal emotional cues.
Validated on V2C Animation benchmark with positive subjective and objective results.
Abstract
The automatic movie dubbing model generates vivid speech from given scripts, replicating a speaker's timbre from a brief timbre prompt while ensuring lip-sync with the silent video. Existing approaches simulate a simplified workflow where actors dub directly without preparation, overlooking the critical director-actor interaction. In contrast, authentic workflows involve a dynamic collaboration: directors actively engage with actors, guiding them to internalize the context cues, specifically emotion, before performance. To address this issue, we propose a new Retrieve-Augmented Director-Actor Interaction Learning scheme to achieve authentic movie dubbing, termed Authentic-Dubber, which contains three novel mechanisms: (1) We construct a multimodal Reference Footage library to simulate the learning footage provided by directors. Note that we integrate Large Language Models (LLMs) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Subtitles and Audiovisual Media
