Towards Pose-invariant Lip-Reading

Shiyang Cheng; Pingchuan Ma; Georgios Tzimiropoulos; Stavros Petridis,; Adrian Bulat; Jie Shen; Maja Pantic

arXiv:1911.06095·cs.CV·November 15, 2019·1 cites

Towards Pose-invariant Lip-Reading

Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis,, Adrian Bulat, Jie Shen, Maja Pantic

PDF

Open Access

TL;DR

This paper introduces a pose-invariant lip-reading framework using synthetic data generated by a 3D Morphable Model, significantly improving recognition accuracy across various mouth poses.

Contribution

The novel use of 3DMM to generate synthetic multi-pose facial data for training lip-reading models, enhancing performance in non-frontal views.

Findings

01

Outperforms previous methods on non-frontal views

02

Achieves up to 20.64% improvement in extreme poses

03

Improves cross-database word recognition accuracy by 2.55%

Abstract

Lip-reading models have been significantly improved recently thanks to powerful deep learning architectures. However, most works focused on frontal or near frontal views of the mouth. As a consequence, lip-reading performance seriously deteriorates in non-frontal mouth views. In this work, we present a framework for training pose-invariant lip-reading models on synthetic data instead of collecting and annotating non-frontal data which is costly and tedious. The proposed model significantly outperforms previous approaches on non-frontal views while retaining the superior performance on frontal and near frontal mouth views. Specifically, we propose to use a 3D Morphable Model (3DMM) to augment LRW, an existing large-scale but mostly frontal dataset, by generating synthetic facial data in arbitrary poses. The newly derived dataset, is used to train a state-of-the-art neural network for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Facial Nerve Paralysis Treatment and Research