Lip Reading Using Convolutional Auto Encoders as Feature Extractor
Dharin Parekh, Ankitesh Gupta, Shharrnam Chhatpar, Anmol Yash Kumar,, Manasi Kulkarni

TL;DR
This paper introduces a novel lip-reading model using convolutional autoencoders for feature extraction combined with LSTM, achieving superior accuracy on standard datasets and breaking existing benchmarks.
Contribution
The paper presents a new word-level lip-reading model that employs convolutional autoencoders for feature extraction, outperforming existing models on benchmark datasets.
Findings
Achieved 98% accuracy on MIRACL-VC1, surpassing 93.4% benchmark.
Performed better than baseline CNN and LSTM models on BBC's LRW dataset.
Demonstrated the effectiveness of autoencoder features in lip-reading tasks.
Abstract
Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal relationship and classify. Though end to end sentence level Lip-reading is the current trend, we proposed a new model which employs word level classification and breaks the set benchmarks for standard datasets. In our model we use convolutional autoencoders as feature extractors which are then fed to a Long short-term memory model. We tested our proposed model on BBC's LRW dataset, MIRACL-VC1 and GRID dataset. Achieving a classification accuracy of 98% on MIRACL-VC1 as compared to 93.4% of the set benchmark (Rekik et al., 2014). On BBC's LRW the proposed model performed better than the baseline model of convolutional neural networks and Long short-term memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
