MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement
Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la, Torre, Yaser Sheikh

TL;DR
MeshTalk introduces a novel cross-modality disentanglement method for generating realistic 3D facial animations from speech, capturing both lip movements and uncorrelated facial motions with high accuracy and realism.
Contribution
The paper proposes a generic audio-driven 3D face animation approach using a categorical latent space and a novel cross-modality loss, enabling scalable and realistic facial animation from speech.
Findings
Outperforms baseline methods in quality metrics
Achieves state-of-the-art realism in qualitative evaluations
Perceptual study favors MeshTalk in over 75% of cases
Abstract
This paper presents a generic method for generating full facial 3D animation from speech. Existing approaches to audio-driven facial animation exhibit uncanny or static upper face animation, fail to produce accurate and plausible co-articulation or rely on person-specific models that limit their scalability. To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face. At the core of our approach is a categorical latent space for facial animation that disentangles audio-correlated and audio-uncorrelated information based on a novel cross-modality loss. Our approach ensures highly accurate lip motion, while also synthesizing plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion. We demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Face and Expression Recognition
