KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

Zhihao Xu; Shengjie Gong; Jiapeng Tang; Lingyu Liang; Yining Huang,; Haojie Li; and Shuangping Huang

arXiv:2409.01113·cs.CV·September 4, 2024

KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

Zhihao Xu, Shengjie Gong, Jiapeng Tang, Lingyu Liang, Yining Huang,, Haojie Li, and Shuangping Huang

PDF

Open Access 1 Repo

TL;DR

KMTalk introduces a progressive learning framework that uses key motion embeddings to synthesize realistic 3D facial animations from audio, improving lip synchronization and temporal coherence over previous methods.

Contribution

The paper proposes a novel key motion embedding approach with a progressive learning scheme for more accurate and consistent 3D facial animation from audio signals.

Findings

01

Outperforms existing methods in realism and synchronization

02

Enhances temporal coherence in 3D facial animations

03

Demonstrates robustness across diverse speech datasets

Abstract

We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ffxzh/kmtalk
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Human Motion and Animation