Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos

Alexander Waibel; Moritz Behr; Fevziye Irem Eyiokur; Dogucan; Yaman; Tuan-Nam Nguyen; Carlos Mullov; Mehmet Arif Demirtas and; Alperen Kantarc{\i}; Stefan Constantin; Haz{\i}m Kemal Ekenel

arXiv:2206.04523·cs.CL·June 10, 2022

Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos

Alexander Waibel, Moritz Behr, Fevziye Irem Eyiokur, Dogucan, Yaman, Tuan-Nam Nguyen, Carlos Mullov, Mehmet Arif Demirtas and, Alperen Kantarc{\i}, Stefan Constantin, Haz{\i}m Kemal Ekenel

PDF

Open Access

TL;DR

This paper introduces an end-to-end neural system that translates videos into different languages while maintaining lip synchronization and voice characteristics of the original speaker.

Contribution

It presents a novel integrated pipeline combining speech recognition, translation, voice conversion, and lip synchronization using GANs for realistic video translation.

Findings

01

System produces lip-synchronous, voice-preserving translated videos.

02

User study confirms realism and effectiveness of the translation.

03

Collected dataset supports future research in video translation.

Abstract

In this paper, we propose a neural end-to-end system for voice preserving, lip-synchronous translation of videos. The system is designed to combine multiple component models and produces a video of the original speaker speaking in the target language that is lip-synchronous with the target speech, yet maintains emphases in speech, voice characteristics, face video of the original speaker. The pipeline starts with automatic speech recognition including emphasis detection, followed by a translation model. The translated text is then synthesized by a Text-to-Speech model that recreates the original emphases mapped from the original sentence. The resulting synthetic voice is then mapped back to the original speakers' voice using a voice conversion model. Finally, to synchronize the lips of the speaker with the translated audio, a conditional generative adversarial network-based model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis

MethodsTest