VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei, He, Sheng Zhao, Arul Menezes, Jiang Bian

TL;DR
This paper introduces a speech-aware machine translation system for video dubbing that explicitly controls the length of translated speech by considering speech duration, leading to better alignment with the original video timing.
Contribution
The proposed system directly incorporates speech duration information into translation, improving length control for video dubbing over traditional word or character-based methods.
Findings
Achieves superior length control in translated speech compared to baselines.
Demonstrates effectiveness across four language pairs.
Constructs a real-world dataset for comprehensive evaluation.
Abstract
Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible to that of the original speech, which requires strict length control. Previous works usually control the number of words or characters generated by the machine translation model to be similar to the source sentence, without considering the isochronicity of speech as the speech duration of words/characters in different languages varies. In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Handwritten Text Recognition Techniques
MethodsTest
