Seeing What You Said: Talking Face Generation Guided by a Lip Reading   Expert

Jiadong Wang; Xinyuan Qian; Malu Zhang; Robby T. Tan; Haizhou Li

arXiv:2303.17480·cs.CV·March 31, 2023·5 cites

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach for talking face generation that incorporates a lip-reading expert to improve the intelligibility of lip movements, achieving state-of-the-art results in lip-reading accuracy and synchronization.

Contribution

It proposes using a lip-reading expert with contrastive learning and a transformer to enhance lip movement intelligibility and synchronization in speech-driven face generation.

Findings

01

Over 38% WER on LRS2 dataset

02

27.8% accuracy on LRW dataset

03

State-of-the-art lip-speech synchronization

Abstract

Talking face generation, also known as speech-to-lip generation, reconstructs facial motions concerning lips given coherent speech input. The previous studies revealed the importance of lip-speech synchronization and visual quality. Despite much progress, they hardly focus on the content of lip movements i.e., the visual intelligibility of the spoken words, which is an important aspect of generation quality. To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results. Moreover, to compensate for data scarcity, we train the lip-reading expert in an audio-visual self-supervised manner. With a lip-reading expert, we propose a novel contrastive learning to enhance lip-speech synchronization, and a transformer to encode audio synchronically with video, while considering global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sxjdwang/talklip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis

MethodsContrastive Learning