Diverse Code Query Learning for Speech-Driven Facial Animation

Chunzhi Gu; Shigeru Kuriyama; Katsuya Hotta

arXiv:2409.19143·cs.CV·October 1, 2024

Diverse Code Query Learning for Speech-Driven Facial Animation

Chunzhi Gu, Shigeru Kuriyama, Katsuya Hotta

PDF

Open Access

TL;DR

This paper introduces a novel approach for speech-driven facial animation that generates diverse and controllable lip-synchronized 3D faces by encouraging sample diversity and leveraging a vector-quantized variational auto-encoding framework.

Contribution

It proposes a new method to produce multiple plausible facial motions from the same speech input, explicitly promoting diversity and control in the synthesis process.

Findings

01

Achieves state-of-the-art diversity in facial animation synthesis.

02

Effectively models the stochastic nature of facial motions.

03

Demonstrates superior qualitative and quantitative performance.

Abstract

Speech-driven facial animation aims to synthesize lip-synchronized 3D talking faces following the given speech signal. Prior methods to this task mostly focus on pursuing realism with deterministic systems, yet characterizing the potentially stochastic nature of facial motions has been to date rarely studied. While generative modeling approaches can easily handle the one-to-many mapping by repeatedly drawing samples, ensuring a diverse mode coverage of plausible facial motions on small-scale datasets remains challenging and less explored. In this paper, we propose predicting multiple samples conditioned on the same audio signal and then explicitly encouraging sample diversity to address diverse facial animation synthesis. Our core insight is to guide our model to explore the expressive facial latent space with a diversity-promoting loss such that the desired latent codes for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Human Motion and Animation · Video Analysis and Summarization

MethodsSparse Evolutionary Training · Focus