Learning Audio-Driven Viseme Dynamics for 3D Face Animation
Linchao Bao, Haoxian Zhang, Yue Qian, Tangli Xue, Changhai Chen,, Xuefei Zhe, Di Kang

TL;DR
This paper introduces a new audio-driven method for generating realistic 3D lip-synchronized facial animations by learning viseme dynamics from speech videos, supporting multilingual inputs and improving controllability for animation workflows.
Contribution
It proposes a novel parametric viseme fitting algorithm guided by phonemes and leverages pretrained deep audio features for robust, multilingual viseme curve prediction, enhancing animation realism and flexibility.
Findings
Achieves state-of-the-art performance in viseme curve prediction.
Supports multilingual speech inputs with high generalizability.
Produces realistic, natural facial animations across different characters.
Abstract
We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D facial animations from the input audio. Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs. The core of our approach is a novel parametric viseme fitting algorithm that utilizes phoneme priors to extract viseme parameters from speech videos. With the guidance of phonemes, the extracted viseme curves can better correlate with phonemes, thus more controllable and friendly to animators. To support multilingual speech inputs and generalizability to unseen voices, we take advantage of deep audio feature models pretrained on multiple languages to learn the mapping from audio to viseme curves. Our audio-to-curves mapping achieves state-of-the-art performance even when the input audio suffers from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Human Motion and Animation · Speech and Audio Processing
