VideoReTalking: Audio-based Lip Synchronization for Talking Head Video   Editing In the Wild

Kun Cheng; Xiaodong Cun; Yong Zhang; Menghan Xia; Fei Yin; Mingrui; Zhu; Xuan Wang; Jue Wang; Nannan Wang

arXiv:2211.14758·cs.CV·November 29, 2022

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Kun Cheng, Xiaodong Cun, Yong Zhang, Menghan Xia, Fei Yin, Mingrui, Zhu, Xuan Wang, Jue Wang, Nannan Wang

PDF

Open Access 1 Repo 3 Models

TL;DR

VideoReTalking is a comprehensive system that edits real-world talking head videos to match input audio with accurate lip-sync and expression changes, producing high-quality, realistic videos without retraining for individual identities.

Contribution

It introduces a sequential, learning-based pipeline for expression editing, lip-sync, and face enhancement that works universally across different persons without retraining.

Findings

01

Outperforms state-of-the-art in lip-sync accuracy

02

Produces higher visual quality in edited videos

03

Operates effectively on in-the-wild examples

Abstract

We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vinthony/video-retalking
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis