UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified   Model

Xiangyu Fan; Jiaqi Li; Zhiqian Lin; Weiye Xiao; Lei Yang

arXiv:2408.00762·cs.CV·August 2, 2024

UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model

Xiangyu Fan, Jiaqi Li, Zhiqian Lin, Weiye Xiao, Lei Yang

PDF

Open Access 1 Repo

TL;DR

UniTalker is a unified model that leverages diverse datasets to significantly improve audio-driven 3D facial animation, achieving better accuracy and generalization across multiple audio domains.

Contribution

The paper introduces UniTalker, a multi-head unified model that effectively utilizes datasets with varied annotations, scaling training data to 18.5 hours and improving animation accuracy.

Findings

01

Achieves 9.2% and 13.7% error reduction on BIWI and Vocaset datasets.

02

Pre-trained UniTalker serves as a strong foundation for further fine-tuning.

03

Fine-tuning surpasses state-of-the-art models even with less data.

Abstract

Audio-driven 3D facial animation aims to map input audio to realistic facial motion. Despite significant progress, limitations arise from inconsistent 3D annotations, restricting previous models to training on specific annotations and thereby constraining the training scale. In this work, we present UniTalker, a unified model featuring a multi-head architecture designed to effectively leverage datasets with varied annotations. To enhance training stability and ensure consistency among multi-head outputs, we employ three training strategies, namely, PCA, model warm-up, and pivot identity embedding. To expand the training scale and diversity, we assemble A2F-Bench, comprising five publicly available datasets and three newly curated datasets. These datasets contain a wide range of audio domains, covering multilingual speech voices and songs, thereby scaling the training data from commonly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

x-niper/unitalker
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsPrincipal Components Analysis