LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial   Control Enhancement

Rui Zhang; Yixiao Fang; Zhengnan Lu; Pei Cheng; Zebiao Huang; Bin Fu

arXiv:2407.18595·cs.CV·July 29, 2024

LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement

Rui Zhang, Yixiao Fang, Zhengnan Lu, Pei Cheng, Zebiao Huang, Bin Fu

PDF

Open Access

TL;DR

LinguaLinker is a diffusion-based framework that synchronizes facial animations with multilingual audio inputs, improving lip-sync accuracy and portrait fidelity for diverse languages.

Contribution

It introduces a holistic diffusion-based approach that implicitly controls facial movements from audio features, surpassing traditional parametric models in animation quality.

Findings

01

Enhanced lip-sync accuracy and portrait fidelity.

02

Versatile application across different languages.

03

Improved control over facial motion nuances.

Abstract

This study delves into the intricacies of synchronizing facial dynamics with multilingual audio inputs, focusing on the creation of visually compelling, time-synchronized animations through diffusion-based techniques. Diverging from traditional parametric models for facial animation, our approach, termed LinguaLinker, adopts a holistic diffusion-based framework that integrates audio-driven visual synthesis to enhance the synergy between auditory stimuli and visual responses. We process audio features separately and derive the corresponding control gates, which implicitly govern the movements in the mouth, eyes, and head, irrespective of the portrait's origin. The advanced audio-driven visual synthesis mechanism provides nuanced control but keeps the compatibility of output video and input audio, allowing for a more tailored and effective portrayal of distinct personas across different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Subtitles and Audiovisual Media · Face recognition and analysis