Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization

Soumik Mukhopadhyay; Saksham Suri; Ravi Teja Gadde; Abhinav; Shrivastava

arXiv:2308.09716·cs.CV·August 21, 2023·2 cites

Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization

Soumik Mukhopadhyay, Saksham Suri, Ravi Teja Gadde, Abhinav, Shrivastava

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

Diff2Lip is a diffusion-based model that achieves high-quality lip synchronization in-the-wild, outperforming previous methods in image quality and synchronization accuracy by leveraging complete contextual information.

Contribution

This paper introduces Diff2Lip, a novel diffusion model for lip-sync that preserves identity and image quality, trained on in-the-wild datasets, and surpasses existing methods in key metrics.

Findings

01

Outperforms Wav2Lip and PC-AVS in FID and MOS scores

02

Effective in both reconstruction and cross settings

03

Operates successfully on in-the-wild videos

Abstract

The task of lip synchronization (lip-sync) seeks to match the lips of human faces with different audio. It has various applications in the film industry as well as for creating virtual avatars and for video conferencing. This is a challenging problem as one needs to simultaneously introduce detailed, realistic lip movements while preserving the identity, pose, emotions, and image quality. Many of the previous methods trying to solve this problem suffer from image quality degradation due to a lack of complete contextual information. In this paper, we present Diff2Lip, an audio-conditioned diffusion-based model which is able to do lip synchronization in-the-wild while preserving these qualities. We train our model on Voxceleb2, a video dataset containing in-the-wild talking face videos. Extensive studies show that our method outperforms popular methods like Wav2Lip and PC-AVS in Fr\'echet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

soumik-kanad/diff2lip
pytorchOfficial

Models

🤗
ameerazam08/diff2lip
model· ♡ 1
♡ 1

Videos

Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization· youtube

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis

Methods3 Dimensional Convolutional Neural Network