DualLip: A System for Joint Lip Reading and Generation
Weicong Chen, Xu Tan, Yingce Xia, Tao Qin, Yu Wang, Tie-Yan Liu

TL;DR
DualLip is a system that jointly enhances lip reading and lip generation by leveraging task duality and unlabeled data, significantly improving performance in talking face applications.
Contribution
It introduces a dual learning framework for lip reading and generation that effectively utilizes unlabeled data, surpassing state-of-the-art results with limited paired data.
Findings
Lip generation with only 10% paired data outperforms full data training.
Achieves 1.16% CER and 2.71% WER on GRID benchmark.
Effectively improves talking face generation using dual lip tasks.
Abstract
Lip reading aims to recognize text from talking lip, while lip generation aims to synthesize talking lip according to text, which is a key component in talking face generation and is a dual task of lip reading. In this paper, we develop DualLip, a system that jointly improves lip reading and generation by leveraging the task duality and using unlabeled text and lip video data. The key ideas of the DualLip include: 1) Generate lip video from unlabeled text with a lip generation model, and use the pseudo pairs to improve lip reading; 2) Generate text from unlabeled lip video with a lip reading model, and use the pseudo pairs to improve lip generation. We further extend DualLip to talking face generation with two additionally introduced components: lip to face generation and text to speech generation. Experiments on GRID and TCD-TIMIT demonstrate the effectiveness of DualLip on improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
