Audio-Driven 3D Facial Animation from In-the-Wild Videos
Liying Lu, Tianke Zhang, Yunfei Liu, Xuangeng Chu, Yu Li

TL;DR
This paper introduces a novel audio-driven 3D facial animation method that leverages in-the-wild 2D videos for training, resulting in improved generalization, lip synchronization, and personalized speaking styles.
Contribution
It utilizes abundant 2D talking-head videos combined with 3D face reconstruction to enhance 3D facial animation from audio, surpassing prior limited-data approaches.
Findings
Outperforms existing methods in lip synchronization quality.
Effectively captures individual speaking styles.
Demonstrates superior generalization on diverse videos.
Abstract
Given an arbitrary audio clip, audio-driven 3D facial animation aims to generate lifelike lip motions and facial expressions for a 3D head. Existing methods typically rely on training their models using limited public 3D datasets that contain a restricted number of audio-3D scan pairs. Consequently, their generalization capability remains limited. In this paper, we propose a novel method that leverages in-the-wild 2D talking-head videos to train our 3D facial animation model. The abundance of easily accessible 2D talking-head videos equips our model with a robust generalization capability. By combining these videos with existing 3D face reconstruction methods, our model excels in generating consistent and high-fidelity lip synchronization. Additionally, our model proficiently captures the speaking styles of different individuals, allowing it to generate 3D talking-heads with distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing
