Learn an Effective Lip Reading Model without Pains
Dalu Feng, Shuang Yang, Shiguang Shan, Xilin Chen

TL;DR
This paper demonstrates that carefully leveraging training strategies and simple refinements can significantly improve lip reading accuracy, surpassing state-of-the-art results without complex model changes.
Contribution
The study provides a comprehensive analysis of training strategies' effects on lip reading performance, highlighting simple refinements that enhance accuracy.
Findings
Performance improved from 83.7% to 88.4% on LRW dataset.
Performance improved from 38.2% to 55.7% on LRW-1000 dataset.
Refinements achieved results comparable or superior to existing state-of-the-art.
Abstract
Lip reading, also known as visual speech recognition, aims to recognize the speech content from videos by analyzing the lip dynamics. There have been several appealing progress in recent years, benefiting much from the rapidly developed deep learning techniques and the recent large-scale lip-reading datasets. Most existing methods obtained high performance by constructing a complex neural network, together with several customized training strategies which were always given in a very brief description or even shown only in the source code. We find that making proper use of these strategies could always bring exciting improvements without changing much of the model. Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Indoor and Outdoor Localization Technologies
