Towards Pose-invariant Lip-Reading
Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis,, Adrian Bulat, Jie Shen, Maja Pantic

TL;DR
This paper introduces a pose-invariant lip-reading framework using synthetic data generated by a 3D Morphable Model, significantly improving recognition accuracy across various mouth poses.
Contribution
The novel use of 3DMM to generate synthetic multi-pose facial data for training lip-reading models, enhancing performance in non-frontal views.
Findings
Outperforms previous methods on non-frontal views
Achieves up to 20.64% improvement in extreme poses
Improves cross-database word recognition accuracy by 2.55%
Abstract
Lip-reading models have been significantly improved recently thanks to powerful deep learning architectures. However, most works focused on frontal or near frontal views of the mouth. As a consequence, lip-reading performance seriously deteriorates in non-frontal mouth views. In this work, we present a framework for training pose-invariant lip-reading models on synthetic data instead of collecting and annotating non-frontal data which is costly and tedious. The proposed model significantly outperforms previous approaches on non-frontal views while retaining the superior performance on frontal and near frontal mouth views. Specifically, we propose to use a 3D Morphable Model (3DMM) to augment LRW, an existing large-scale but mostly frontal dataset, by generating synthetic facial data in arbitrary poses. The newly derived dataset, is used to train a state-of-the-art neural network for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Facial Nerve Paralysis Treatment and Research
