MultiTalk: Enhancing 3D Talking Head Generation Across Languages with   Multilingual Video Dataset

Kim Sung-Bin; Lee Chae-Yeon; Gihun Son; Oh Hyun-Bin; Janghoon Ju,; Suekyeong Nam; Tae-Hyun Oh

arXiv:2406.14272·cs.CV·June 21, 2024

MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

Kim Sung-Bin, Lee Chae-Yeon, Gihun Son, Oh Hyun-Bin, Janghoon Ju,, Suekyeong Nam, Tae-Hyun Oh

PDF

Open Access 1 Models

TL;DR

This paper introduces MultiTalk, a model for generating 3D talking heads across multiple languages, supported by a new multilingual video dataset, improving lip-sync accuracy in diverse linguistic contexts.

Contribution

The work presents a new multilingual dataset and a model that incorporates language-specific style embeddings to enhance 3D talking head generation across languages.

Findings

01

Significant improvement in multilingual lip-sync accuracy.

02

Introduction of a new multilingual 2D video dataset with 420 hours of content.

03

Effective incorporation of language-specific style embeddings.

Abstract

Recent studies in speech-driven 3D talking head generation have achieved convincing results in verbal articulations. However, generating accurate lip-syncs degrades when applied to input speech in other languages, possibly due to the lack of datasets covering a broad spectrum of facial movements across languages. In this work, we introduce a novel task to generate 3D talking heads from speeches of diverse languages. We collect a new multilingual 2D video dataset comprising over 420 hours of talking videos in 20 languages. With our proposed dataset, we present a multilingually enhanced model that incorporates language-specific style embeddings, enabling it to capture the unique mouth movements associated with each language. Additionally, we present a metric for assessing lip-sync accuracy in multilingual settings. We demonstrate that training a 3D talking head model with our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ameerazam08/MultiTalk-Code
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Human Pose and Action Recognition · Multimodal Machine Learning Applications