VisemeNet: Audio-Driven Animator-Centric Speech Animation

Yang Zhou; Zhan Xu; Chris Landreth; Evangelos Kalogerakis; Subhransu; Maji; Karan Singh

arXiv:1805.09488·cs.GR·June 8, 2018·21 cites

VisemeNet: Audio-Driven Animator-Centric Speech Animation

Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu, Maji, Karan Singh

PDF

Open Access

TL;DR

VisemeNet introduces a real-time, deep-learning approach for generating animator-centric speech motion curves from audio, enhancing lip-sync accuracy and style encoding for animation pipelines.

Contribution

It presents a three-stage LSTM architecture that models speech and style for producing viseme motion curves directly from audio, integrating seamlessly into animation workflows.

Findings

01

Achieves accurate lip-synchronization validated by cross-validation and animator critique.

02

Resilient to speaker and language diversity.

03

Outperforms recent deep-learning lip-sync methods.

Abstract

We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Human Motion and Animation