LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping   Deformation

Deng Junli; Luo Yihao; Yang Xueting; Li Siyou; Wang Wei; Guo Jinyang,; Shi Ping

arXiv:2409.09326·cs.CV·September 17, 2024

LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation

Deng Junli, Luo Yihao, Yang Xueting, Li Siyou, Wang Wei, Guo Jinyang,, Shi Ping

PDF

Open Access 1 Repo

TL;DR

LawDNet introduces a novel deep-learning approach with local affine warping to improve the realism, diversity, and temporal coherence of audio-driven lip synthesis for photorealistic avatars.

Contribution

The paper presents LawDNet, a new architecture that models lip movements using local affine warping fields and a dual-stream discriminator, enhancing lip synthesis quality and robustness.

Findings

01

Outperforms previous methods in lip movement realism and diversity

02

Achieves superior temporal coherence and robustness in lip synthesis

03

Provides accessible source code and pre-trained models for research community

Abstract

In the domain of photorealistic avatar generation, the fidelity of audio-driven lip motion synthesis is essential for realistic virtual interactions. Existing methods face two key challenges: a lack of vivacity due to limited diversity in generated lip poses and noticeable anamorphose motions caused by poor temporal coherence. To address these issues, we propose LawDNet, a novel deep-learning architecture enhancing lip synthesis through a Local Affine Warping Deformation mechanism. This mechanism models the intricate lip movements in response to the audio input by controllable non-linear warping fields. These fields consist of local affine transformations focused on abstract keypoints within deep feature maps, offering a novel universal paradigm for feature warping in networks. Additionally, LawDNet incorporates a dual-stream discriminator for improved frame-to-frame continuity and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iPaw-AI-LAB/LawDNet_2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing