Talking Face Generation with Multilingual TTS

Hyoung-Kyu Song; Sang Hoon Woo; Junhyeok Lee; Seungmin Yang; Hyunjae; Cho; Youseong Lee; Dongho Choi; Kang-wook Kim

arXiv:2205.06421·cs.CV·May 16, 2022

Talking Face Generation with Multilingual TTS

Hyoung-Kyu Song, Sang Hoon Woo, Junhyeok Lee, Seungmin Yang, Hyunjae, Cho, Youseong Lee, Dongho Choi, Kang-wook Kim

PDF

Open Access 1 Models

TL;DR

This paper introduces a system that generates realistic multilingual talking face videos from text, maintaining speaker identity and lip synchronization, with applications in translation and dubbing.

Contribution

It presents a novel joint system combining multilingual TTS and talking face generation, capable of producing synchronized videos in multiple languages from text input.

Findings

01

Successfully generates multilingual talking face videos in four languages.

02

Maintains speaker vocal identity and lip synchronization across languages.

03

Demonstrates generalization to multiple language families.

Abstract

In this work, we propose a joint system combining a talking face generation system with a text-to-speech system that can generate multilingual talking face videos from only the text input. Our system can synthesize natural multilingual speeches while maintaining the vocal identity of the speaker, as well as lip movements synchronized to the synthesized speech. We demonstrate the generalization capabilities of our system by selecting four languages (Korean, English, Japanese, and Chinese) each from a different language family. We also compare the outputs of our talking face generation model to outputs of a prior work that claims multilingual support. For our demo, we add a translation API to the preprocessing stage and present it in the form of a neural dubber so that users can utilize the multilingual property of our system more easily.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
deepkyu/ml-talking-face
model· ♡ 10
♡ 10

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing