Audio-driven High-resolution Seamless Talking Head Video Editing via   StyleGAN

Jiacheng Su; Kunhong Liu; Liyan Chen; Junfeng Yao; Qingsong Liu,; Dongdong Lv

arXiv:2407.05577·cs.CV·July 9, 2024

Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN

Jiacheng Su, Kunhong Liu, Liyan Chen, Junfeng Yao, Qingsong Liu,, Dongdong Lv

PDF

Open Access

TL;DR

This paper introduces a novel method for high-resolution, seamless talking head video editing driven by audio, combining emotion-aware landmark prediction and StyleGAN-based editing to improve visual quality.

Contribution

It proposes a two-module framework that predicts emotional landmarks from speech and uses StyleGAN for seamless face video editing, enhancing visual effects.

Findings

01

Produces high-resolution, high-quality videos

02

Outperforms state-of-the-art methods in visual quality

03

Achieves seamless emotion and content editing

Abstract

The existing methods for audio-driven talking head video editing have the limitations of poor visual effects. This paper tries to tackle this problem through editing talking face images seamless with different emotions based on two modules: (1) an audio-to-landmark module, consisting of the CrossReconstructed Emotion Disentanglement and an alignment network module. It bridges the gap between speech and facial motions by predicting corresponding emotional landmarks from speech; (2) a landmark-based editing module edits face videos via StyleGAN. It aims to generate the seamless edited video consisting of the emotion and content components from the input audio. Extensive experiments confirm that compared with state-of-the-arts methods, our method provides high-resolution videos with high visual quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Speech and Audio Processing · Advanced Data Compression Techniques

MethodsHuMan(Expedia)||How do I get a human at Expedia? · Dense Connections · Convolution · Feedforward Network · Adaptive Instance Normalization · R1 Regularization · StyleGAN