Perceptual Conversational Head Generation with Regularized Driver and   Enhanced Renderer

Ailin Huang; Zhewei Huang; Shuchang Zhou

arXiv:2206.12837·cs.CV·August 3, 2022

Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer

Ailin Huang, Zhewei Huang, Shuchang Zhou

PDF

1 Repo

TL;DR

This paper presents a novel approach for generating realistic conversational head videos from audio and images, using a regularized driver and enhanced renderer, achieving top results in a multimedia challenge.

Contribution

It introduces a generalized audio-to-head driver with regularization and a high-quality renderer, improving the realism and consistency of generated conversational videos.

Findings

01

Achieved first place in listening head generation

02

Secured second place in talking head generation

03

Produced high-visual quality conversational videos

Abstract

This paper reports our solution for ACM Multimedia ViCo 2022 Conversational Head Generation Challenge, which aims to generate vivid face-to-face conversation videos based on audio and reference images. Our solution focuses on training a generalized audio-to-head driver using regularization and assembling a high-visual quality renderer. We carefully tweak the audio-to-behavior model and post-process the generated video using our foreground-background fusion module. We get first place in the listening head generation track and second place in the talking head generation track on the official leaderboard. Our code is available at https://github.com/megvii-research/MM2022-ViCoPerceptualHeadGeneration.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

megvii-research/MM2022-ViCoPerceptualHeadGeneration
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.