MeshTalk: 3D Face Animation from Speech using Cross-Modality   Disentanglement

Alexander Richard; Michael Zollhoefer; Yandong Wen; Fernando de la; Torre; Yaser Sheikh

arXiv:2104.08223·cs.CV·May 23, 2022

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la, Torre, Yaser Sheikh

PDF

Open Access 2 Repos

TL;DR

MeshTalk introduces a novel cross-modality disentanglement method for generating realistic 3D facial animations from speech, capturing both lip movements and uncorrelated facial motions with high accuracy and realism.

Contribution

The paper proposes a generic audio-driven 3D face animation approach using a categorical latent space and a novel cross-modality loss, enabling scalable and realistic facial animation from speech.

Findings

01

Outperforms baseline methods in quality metrics

02

Achieves state-of-the-art realism in qualitative evaluations

03

Perceptual study favors MeshTalk in over 75% of cases

Abstract

This paper presents a generic method for generating full facial 3D animation from speech. Existing approaches to audio-driven facial animation exhibit uncanny or static upper face animation, fail to produce accurate and plausible co-articulation or rely on person-specific models that limit their scalability. To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face. At the core of our approach is a categorical latent space for facial animation that disentangles audio-correlated and audio-uncorrelated information based on a novel cross-modality loss. Our approach ensures highly accurate lip motion, while also synthesizing plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion. We demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Face and Expression Recognition