Memory-augmented Contrastive Learning for Talking Head Generation

Jianrong Wang; Yaxin Zhao; Li Liu; Hongkai Fan; Tianyi Xu; Qi Li; Sen; Li

arXiv:2302.13469·cs.MM·February 28, 2023·1 cites

Memory-augmented Contrastive Learning for Talking Head Generation

Jianrong Wang, Yaxin Zhao, Li Liu, Hongkai Fan, Tianyi Xu, Qi Li, Sen, Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a memory-augmented contrastive learning approach for talking head generation, enabling the synthesis of realistic, lip-synchronized videos with natural head movements by modeling multiple speech-to-face mappings.

Contribution

It proposes a novel memory-augmented self-supervised contrastive learning framework and uses Mixed Density Networks for landmark prediction, improving facial animation quality over existing methods.

Findings

01

Significantly better facial animation quality than SOTA methods

02

Effective modeling of multiple speech-to-face mappings

03

Enhanced lip synchronization and natural head movements

Abstract

Given one reference facial image and a piece of speech as input, talking head generation aims to synthesize a realistic-looking talking head video. However, generating a lip-synchronized video with natural head movements is challenging. The same speech clip can generate multiple possible lip and head movements, that is, there is no one-to-one mapping relationship between them. To overcome this problem, we propose a Speech Feature Extractor (SFE) based on memory-augmented self-supervised contrastive learning, which introduces the memory module to store multiple different speech mapping results. In addition, we introduce the Mixed Density Networks (MDN) into the landmark regression task to generate multiple predicted facial landmarks. Extensive qualitative and quantitative experiments show that the quality of our facial animation is significantly superior to that of the state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yaxinzhao97/macl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing