StyleLipSync: Style-based Personalized Lip-sync Video Generation
Taekyung Ki, Dongchan Min

TL;DR
StyleLipSync is a novel style-based model that generates personalized, high-quality lip-sync videos from arbitrary audio, leveraging a pre-trained StyleGAN and pose-aware masking for naturalness.
Contribution
It introduces a style-based generative approach with pose-aware masking and a few-shot adaptation method for personalized lip-sync video generation.
Findings
Accurately generates lip-sync videos in zero-shot settings.
Enhances unseen face characteristics with minimal target video.
Outperforms previous methods in naturalness and personalization.
Abstract
In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation. In contrast to the previous lip-sync methods, we introduce pose-aware masking that dynamically locates the mask to improve the naturalness over frames by utilizing a 3D parametric mesh predictor frame by frame. Moreover, we propose a few-shot lip-sync adaptation method for an arbitrary person by introducing a sync regularizer that preserves lip-sync generalization while enhancing the person-specific visual information. Extensive experiments demonstrate that our model can generate accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
MethodsDense Connections · R1 Regularization · Convolution · Adaptive Instance Normalization · Feedforward Network · HuMan(Expedia)||How do I get a human at Expedia? · StyleGAN
