StyleLipSync: Style-based Personalized Lip-sync Video Generation

Taekyung Ki; Dongchan Min

arXiv:2305.00521·cs.CV·February 13, 2024·1 cites

StyleLipSync: Style-based Personalized Lip-sync Video Generation

Taekyung Ki, Dongchan Min

PDF

Open Access

TL;DR

StyleLipSync is a novel style-based model that generates personalized, high-quality lip-sync videos from arbitrary audio, leveraging a pre-trained StyleGAN and pose-aware masking for naturalness.

Contribution

It introduces a style-based generative approach with pose-aware masking and a few-shot adaptation method for personalized lip-sync video generation.

Findings

01

Accurately generates lip-sync videos in zero-shot settings.

02

Enhances unseen face characteristics with minimal target video.

03

Outperforms previous methods in naturalness and personalization.

Abstract

In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation. In contrast to the previous lip-sync methods, we introduce pose-aware masking that dynamically locates the mask to improve the naturalness over frames by utilizing a 3D parametric mesh predictor frame by frame. Moreover, we propose a few-shot lip-sync adaptation method for an arbitrary person by introducing a sync regularizer that preserves lip-sync generalization while enhancing the person-specific visual information. Extensive experiments demonstrate that our model can generate accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing

MethodsDense Connections · R1 Regularization · Convolution · Adaptive Instance Normalization · Feedforward Network · HuMan(Expedia)||How do I get a human at Expedia? · StyleGAN