Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical   and Landmark Loss Optimization

Bin Lin; Yanzhen Yu; Jianhao Ye; Ruitao Lv; Yuguang Yang; Ruoye Xie,; Pan Yu; Hongbin Zhou

arXiv:2410.14283·cs.CV·October 21, 2024

Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization

Bin Lin, Yanzhen Yu, Jianhao Ye, Ruitao Lv, Yuguang Yang, Ruoye Xie,, Pan Yu, Hongbin Zhou

PDF

Open Access

TL;DR

Takin-ADA is a real-time, high-resolution audio-driven facial animation method that improves expression transfer, lip-sync accuracy, and control over facial dynamics, outperforming existing solutions.

Contribution

We introduce Takin-ADA, a two-stage approach with novel loss functions and audio processing techniques for enhanced, controllable, and high-quality facial animation.

Findings

01

Achieves 42 FPS at 512x512 resolution on RTX 4090

02

Outperforms existing methods in video quality and realism

03

Provides flexible control over facial expressions and head motions

Abstract

Existing audio-driven facial animation methods face critical challenges, including expression leakage, ineffective subtle expression transfer, and imprecise audio-driven synchronization. We discovered that these issues stem from limitations in motion representation and the lack of fine-grained control over facial expressions. To address these problems, we present Takin-ADA, a novel two-stage approach for real-time audio-driven portrait animation. In the first stage, we introduce a specialized loss function that enhances subtle expression transfer while reducing unwanted expression leakage. The second stage utilizes an advanced audio processing technique to improve lip-sync accuracy. Our method not only generates precise lip movements but also allows flexible control over facial expressions and head motions. Takin-ADA achieves high-resolution (512x512) facial animations at up to 42 FPS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Advanced Vision and Imaging · Music and Audio Processing