Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with   Motion and Appearance Disentanglement

Runyi Yu; Tianyu He; Ailing Zhang; Yuchi Wang; Junliang Guo; Xu Tan,; Chang Liu; Jie Chen; Jiang Bian

arXiv:2406.08096·cs.CV·June 18, 2024

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

Runyi Yu, Tianyu He, Ailing Zhang, Yuchi Wang, Junliang Guo, Xu Tan,, Chang Liu, Jie Chen, Jiang Bian

PDF

Open Access

TL;DR

This paper introduces a novel approach for lip sync in talking videos by disentangling motion and appearance, enabling high-fidelity, personalized lip synchronization that generalizes well across different identities.

Contribution

It proposes a speech-to-motion diffusion model and a motion-conditioned appearance generator with landmark-based identity preservation and separate encoders for visual details.

Findings

01

Outperforms existing methods in lip-sync quality

02

Preserves personal identity and visual details effectively

03

Generalizes well to unseen individuals

Abstract

We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2) visual appearance synthesis. Current solutions handle the two sub-problems within a single generative model, resulting in a challenging trade-off between lip-sync quality and visual details preservation. Instead, we propose to disentangle the motion and appearance, and then generate them one by one with a speech-to-motion diffusion model and a motion-conditioned appearance generation model. However, there still remain challenges in each stage, such as motion-aware identity preservation in (1) and visual details preservation in (2). Therefore, to preserve personal identity, we adopt landmarks to represent the motion, and further employ a landmark-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsDiffusion