Audio-Driven Talking Face Video Generation with Dynamic Convolution   Kernels

Zipeng Ye; Mengfei Xia; Ran Yi; Juyong Zhang; Yu-Kun Lai; Xuwei Huang,; Guoxin Zhang; Yong-jin Liu

arXiv:2201.05986·cs.CV·April 20, 2022

Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

Zipeng Ye, Mengfei Xia, Ran Yi, Juyong Zhang, Yu-Kun Lai, Xuwei Huang,, Guoxin Zhang, Yong-jin Liu

PDF

TL;DR

This paper introduces a dynamic convolution kernel strategy for neural networks that enables real-time, high-quality audio-driven talking face video generation, demonstrating robustness across various identities and conditions.

Contribution

The paper proposes a novel dynamic convolution kernel approach tailored for talking face video synthesis, improving quality and efficiency over existing methods.

Findings

01

Generates high-quality videos at 60 fps

02

Robust to different identities, head postures, and audio inputs

03

Outperforms state-of-the-art methods in quality and speed

Abstract

In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural networks. Using a fully convolutional network with the proposed DCKs, high-quality talking-face video can be generated from multi-modal sources (i.e., unmatched audio and video) in real time, and our trained model is robust to different identities, head postures, and input audios. Our proposed DCKs are specially designed for audio-driven talking face video generation, leading to a simple yet effective end-to-end system. We also provide a theoretical analysis to interpret why DCKs work. Experimental results show that our method can generate high-quality talking-face video with background at 60 fps. Comparison and evaluation between our method and the state-of-the-art methods demonstrate the superiority of our method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution