Learn an Effective Lip Reading Model without Pains

Dalu Feng; Shuang Yang; Shiguang Shan; Xilin Chen

arXiv:2011.07557·cs.CV·November 17, 2020·51 cites

Learn an Effective Lip Reading Model without Pains

Dalu Feng, Shuang Yang, Shiguang Shan, Xilin Chen

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that carefully leveraging training strategies and simple refinements can significantly improve lip reading accuracy, surpassing state-of-the-art results without complex model changes.

Contribution

The study provides a comprehensive analysis of training strategies' effects on lip reading performance, highlighting simple refinements that enhance accuracy.

Findings

01

Performance improved from 83.7% to 88.4% on LRW dataset.

02

Performance improved from 38.2% to 55.7% on LRW-1000 dataset.

03

Refinements achieved results comparable or superior to existing state-of-the-art.

Abstract

Lip reading, also known as visual speech recognition, aims to recognize the speech content from videos by analyzing the lip dynamics. There have been several appealing progress in recent years, benefiting much from the rapidly developed deep learning techniques and the recent large-scale lip-reading datasets. Most existing methods obtained high performance by constructing a complex neural network, together with several customized training strategies which were always given in a very brief description or even shown only in the source code. We find that making proper use of these strategies could always bring exciting improvements without changing much of the model. Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Fengdalu/learn-an-effective-lip-reading-model-without-pains
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Indoor and Outdoor Localization Technologies